← all jobs

[Remote] Principal AI / Machine Learning Data Engineer - Remote or hybrid from MN or DC

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. UnitedHealth Group is a global organization that delivers care aided by technology to help millions of people live healthier lives. The Principal AI Data Engineer will design and build end-to-end AI pipelines for large-scale unstructured data, enabling advanced analytics and Generative AI. This role involves transforming complex datasets into AI-ready data products and supporting machine learning workflows.

Responsibilities

  • Design, develop, and maintain scalable data pipelines and data platforms supporting analytics, machine learning, and AI use cases
  • Build and optimize ingestion frameworks for large-scale structured and unstructured data, including streaming and event-driven sources
  • Partner with cross-functional stakeholders to understand evolving data and AI needs and define long-term technical solutions
  • Enable and support machine learning and AI workflows, including feature engineering, data preparation, and model deployment support
  • Drive strategic initiatives around Generative AI, data quality, observability, lineage, and governance
  • Develop and maintain frameworks that support rapid experimentation and deployment of AI/ML solutions
  • Introduce and evolve best practices in data modeling, orchestration, testing, and monitoring
  • Identify and champion opportunities for platform scalability, performance optimization, and cost efficiency
  • Collaborate with product, analytics, and infrastructure teams to deliver high-impact data and AI solutions
  • Build and maintain reusable parsing, enrichment, analytic, and service libraries to accelerate delivery across teams
  • Work comfortably under time-sensitive conditions while ensuring thoroughness
  • Maintain high ethical standards and the ability to remain objective and confidential
  • You will be building and operating production data platforms and pipelines across batch and streaming workloads
  • Working hands-on engineering in Python and SQL; in a JVM languages (Java/Scala) Spark ecosystems
  • Distributed processing and lakehouse/warehouse patterns (eg, Spark/PySpark, Databricks, Snowflake)
  • Build pipelines for OCR, document parsing, and text extraction from image-based or scanned data sources
  • Enabling Generative AI solutions in production (eg, RAG-style architectures), including retrieval patterns and evaluation/monitoring practices
  • Take a knowledge-centric data approaches (eg, metadata-driven systems, entity resolution, and/or graph concepts) to improve discoverability and downstream analytics
  • Data quality, observability, and monitoring mindset (profiling, validation, alerting, and reliability improvements)
  • Orchestrate, CI/CD, containerization, and infrastructure-as-code (eg, Airflow, GitHub Actions, Docker, Terraform, Kubernetes)
  • Work in the Cloud (AWS, Azure, and/or GCP), including secure handling of sensitive data (PII/PHI) and collaboration with compliance partners
  • Lead through influence, mentor engineers, and translate ambiguous problems into scalable technical roadmaps

Skills

  • Bachelor's degree or equivalent experience
  • 5+ years of experience designing, building, and operating scalable data pipelines and platforms (batch + streaming)
  • 2+ years of experience deploying Generative AI solutions to production (e.g., RAG, LLM-powered pipelines, semantic search)
  • Proven solid hands-on development in Python and SQL, with experience in Spark/PySpark and Databricks (or similar distributed platforms)
  • Experience building ingestion and processing frameworks for unstructured data (OCR, documents, images), including parsing and enrichment
  • Experience with cloud platforms (AWS/Azure/GCP), DevOps/CI/CD, and infrastructure-as-code, including secure handling of sensitive data (PII/PHI)
  • Proven ability to design scalable solutions, implement data quality/observability practices, and collaborate across stakeholders
  • Experience with cloud platforms such as AWS, Azure, or Google Cloud, including managed data services
  • Experience with streaming and event-driven architectures (e.g., Kafka, Kinesis, Event Hubs)
  • Experience with data quality and validation frameworks (e.g., Great Expectations, Deequ) and/or data observability tooling
  • Experience enabling MLOps practices (e.g., feature stores, model registries, experiment tracking, deployment automation)
  • Experience with lakehouse architectures, Delta Lake, and advanced Spark optimization/performance tuning
  • Experience with data visualization tools and libraries such as Plotly, seaborn, and Chartjs
  • Experience with machine learning and predictive analytics
  • Familiarity with security and privacy concepts for data platforms (e.g., least privilege, PII/PHI handling) and working with compliance partners
  • Solid hands-on engineering in Python and SQL; familiarity with JVM languages (Java/Scala) in Spark ecosystems

Benefits

  • A comprehensive benefits package
  • Incentive and recognition programs
  • Equity stock purchase
  • 401k contribution (all benefits are subject to eligibility requirements)

Company Overview

  • UnitedHealth Group is a medical insurance company that offers health technology, patient checkups, and pharmacy services. It was founded in 1977, and is headquartered in Minneapolis, Minnesota, USA, with a workforce of 10001+ employees. Its website is https://www.unitedhealthgroup.com/.
  • More open positions