← all jobs

[Remote] Principal AIOps Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. CVS Health is seeking a Principal AIOps Engineer to lead enterprise-scale AIOps strategy, automation, and observability. The role focuses on modernizing IT operations through advanced technologies to enhance reliability and efficiency.

Responsibilities

  • Lead the AIOps strategy, roadmap, and operating model (intake, triage, automation lifecycle, KPIs) to measurably improve MTTR, alert quality, and operational efficiency
  • Own the observability-to-AIOps pipeline (metrics, logs, traces, events) and drive standardization of telemetry, service health models, and actionable alerting across teams and platforms
  • Design and implement event intelligence: correlation, deduplication, suppression, anomaly detection, incident clustering, and probable-cause analysis using topology/CMDB context
  • Advise operations, service owners, and leadership stakeholders; lead change enablement, adoption, and value measurement for AIOps and agentic automation across the organization
  • Develop ServiceNow-centric AIOps integrations (ITSM + ITOM/Event Management where applicable): event ingestion, alert-to-incident policies, enrichment, assignment/routing, approvals, change workflows, and closure updates for auditable closed-loop ops
  • Establish governance for operational AI (risk controls, approvals, auditability, data access, prompt/response logging, evaluation, and continuous improvement) in partnership with security, compliance, and operations
  • Build and operationalize agentic AI workflows for incident triage and resolution: signal summarization, similar-incident retrieval, knowledge article drafting, ticket updates, stakeholder communications, and human-in-the-loop remediation
  • Enable closed-loop automation and self-healing by connecting AIOps detections to orchestrated actions (runbooks/workflows), with clear approvals, safety checks, and rollback paths
  • Partner with NOC/SOC, infrastructure, and application owners to onboard services into AIOps, define service models, and improve signal quality, escalation paths, and operational readiness
  • Create enablement materials (playbooks, operating procedures, dashboards) and coach teams on AIOps practices, agentic AI usage, and responsible automation

Skills

  • 10+ years of experience in SRE, production operations supporting highly available services along with experience with Product model
  • Proven technical leadership: ability to set direction, lead cross-team initiatives, and advise stakeholders through architecture reviews, tradeoffs, and operational readiness
  • Strong programming/scripting skills (Python preferred) and experience building automation, integrations, and APIs
  • Experience integrating observability platforms and event sources across hybrid environments (cloud/on-prem) and operating production-grade monitoring/event management at scale
  • Strong ServiceNow experience as an ITSM system of record (Incident/Problem/Change; CMDB/asset concepts). Ability to build and operate integrations at scale (REST, webhooks, event management) to support automation and auditability
  • Automation & Integration Engineering: Python (preferred) for automation and data/ML pipelines; experience building integrations, services, and operational tooling
  • Workflow orchestration and integrations (ServiceNow APIs, event pipelines, runbook automation) with strong reliability, security, and auditability practices
  • AIOps, ITSM/ITOM (ServiceNow) & Agentic AI Ecosystem: Observability: Prometheus/Grafana, OpenTelemetry, ELK/Splunk/Datadog (or equivalent)
  • ServiceNow ITSM/ITOM: Incident/Problem/Change, CMDB/service mapping concepts, and Event Management/AIOps integrations (where applicable)
  • Agentic AI frameworks: building tool-using agents, retrieval workflows, prompt/response logging, evaluation, and guardrails
  • Operational ML/Analytics: anomaly detection and time-series analysis, correlation approaches, and model/agent evaluation & monitoring in production
  • Demonstrated experience applying machine learning and/or LLM-based approaches to operational problems (noise reduction, correlation, anomaly detection, summarization, and assisted remediation) in production environments
  • Experience building an agentic AI platform/ecosystem (shared tools, standardized patterns, evaluation, and guardrails) and enabling multiple teams to safely deliver automations
  • Familiarity with ServiceNow ITOM / Event Management / AIOps capabilities (or equivalent) and integrating observability signals into ITSM workflows
  • Strong Linux and networking fundamentals (TCP/IP, DNS, TLS, load balancing) and ability to troubleshoot distributed systems end-to-end
  • DevOps, or platform engineering experience supporting highly available services along with experience with Product model
  • Excellent communication skills with the ability to lead incident bridges, write clear postmortems, and influence reliability improvements across teams

Benefits

  • CVS Health bonus, commission or short-term incentive program
  • Award target in the company’s equity award program
  • Comprehensive benefits package designed to support the physical, emotional, and financial well‑being of colleagues and their families
  • Medical, dental, and vision coverage
  • Paid time off
  • Retirement savings options
  • Wellness programs
  • Other resources, based on eligibility

Company Overview

  • CVS Health is a health solutions company that provides an integrated healthcare services to its members. It was founded in 1963, and is headquartered in Woonsocket, Rhode Island, USA, with a workforce of 10001+ employees. Its website is https://www.cvshealth.com/.
  • Company H1B Sponsorship

  • CVS Health has a track record of offering H1B sponsorships, with 1 in 2022. Please note that this does not guarantee sponsorship for this specific role.
  • More open positions

    [Remote] Senior Marketing Associate Campus Sales - Midwest

    Work from home Full-time role

    [Remote] Sr Director Analyst – Application Security and Governance (Remote - U.S.)

    Work from home Full-time role

    [Remote] Business Development Manager, AI

    Work from home Full-time role

    [Remote] Project Manager, Building Sales

    Work from home Full-time role

    [Remote] Brand Program Analyst - Pharmaceuticals

    Work from home Full-time role

    Entry-Level Remote Data Entry Associate – Earn While You Learn with careerzynith

    Work from home Full-time role

    Billing Analyst

    Work from home Full-time role

    Software Engineer, iOS Core Product - Richmond, VA, USA

    Work from home Full-time role

    [Remote] GTM Business Operations Manager

    Work from home Full-time role

    Cybersecurity Lead MedTech R&D

    Work from home Full-time role

    Flexible Remote Chat Jobs - Full-Time or Part-T...

    Work from home Full-time role

    [Remote] Account Executive - Acute Therapies - St. Louis

    Work from home Full-time role

    Senior Full Stack Java Developer (Java + React)

    Work from home Full-time role

    Psychologist

    Work from home Full-time role

    Financial Analyst – Remote / Hybrid

    Work from home Full-time role

    Salesforce Architect

    Work from home Full-time role

    Associate Client Manager, Commercial Risk - Founder Shield

    Work from home Full-time role

    Technical Project Manager

    Work from home Full-time role

    NBIB Background Investigator - Nationwide

    Work from home Full-time role

    Digital Marketing Coordinator - Remote (Pacific Time Zone)

    Work from home Full-time role

    Coding Quality Analyst

    Work from home Full-time role