[Remote] Senior Technical Product Manager, Observability
Note: The job is a remote job and is open to candidates in USA. Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world. They are seeking a highly skilled and experienced Senior Technical Product Manager to own the Observability Platform, focusing on telemetry ingestion, querying, visualization, and alerting for large-scale GPU clusters.
Responsibilities
- Own the end-to-end Observability Platform roadmap across telemetry ingestion, querying, visualization, alerting, and retention for large-scale GPU clusters and multi-tenant cloud environments
- Define Vultr's observability strategy across bare metal, VMs, Kubernetes, and managed services, aligned to infrastructure roadmap, reliability goals, and customer experience
- Drive the customer-facing observability surface across dashboards, APIs, telemetry pipelines, and topology-aware insights
- Translate low-level signals across GPU, CPU, memory, storage, and network into actionable health views, alerts, and debugging workflows for customers
- Work closely with engineering on technical tradeoffs across metrics agents, collectors, data models, telemetry pipelines, APIs, and retention architecture
- Build products for distributed AI environments by understanding how training and inference workloads behave across nodes, clusters, schedulers, and network fabrics
- Define health models that help customers quickly identify degraded nodes, performance anomalies, and cluster bottlenecks at fleet scale
- Ensure new infrastructure and platform launches are observable by design through strong partnership with compute, network, and platform teams
- Stay current on modern observability stacks and AI infrastructure trends, including how GPU workloads change performance analysis, cost attribution, and operational workflows
Skills
- 7+ years of product management experience in cloud infrastructure, observability, monitoring, or developer platforms
- Deep understanding of observability and monitoring systems, including metrics, logging, tracing, alerting, and telemetry pipeline architecture
- Experience defining product strategy and roadmaps for platform or infrastructure products at scale
- Strong technical background — ability to engage with engineering on telemetry agents, data models, query engines, retention, and distributed systems
- Experience with GPU, AI/ML, or HPC infrastructure monitoring and the unique observability challenges of training and inference workloads
- Track record of shipping developer- and operator-facing products with measurable impact on reliability, time-to-detect, or operational efficiency
- Experience working across cross-functional teams (engineering, design, marketing, sales) in a fast-paced environment
- Excellent written and verbal communication skills, with the ability to translate complex technical concepts for diverse audiences
- Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience)
Benefits
- 100% company-paid insurance premiums for employee medical, dental and vision plans.
- 401(k) plan that matches 100% up to 4%, with immediate vesting
- Professional Development Reimbursement of $2,500 each year
- 11 Holidays + Paid Time Off Accrual + Rollover Plan
- Commitment matters to Vultr! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
- $500 stipend for remote office setup in first year + $400 each following year
- Internet reimbursement up to $75 per month
- Gym membership reimbursement up to $50 per month
- Company paid Wellable subscription
Company Overview
Company H1B Sponsorship