← all jobs

[Remote] Principal AI / Machine Learning Data Engineer - Remote or hybrid from MN or DC

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Dice is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The Principal AI Data Engineer will design and build end-to-end AI pipelines for large-scale unstructured data, enabling advanced analytics and Generative AI. This role focuses on transforming complex datasets into AI-ready data products and building modern data pipelines.

Responsibilities

  • Design, develop, and maintain scalable data pipelines and data platforms supporting analytics, machine learning, and AI use cases
  • Build and optimize ingestion frameworks for large-scale structured and unstructured data, including streaming and event-driven sources
  • Partner with cross-functional stakeholders to understand evolving data and AI needs and define long-term technical solutions
  • Enable and support machine learning and AI workflows, including feature engineering, data preparation, and model deployment support
  • Drive strategic initiatives around Generative AI, data quality, observability, lineage, and governance
  • Develop and maintain frameworks that support rapid experimentation and deployment of AI/ML solutions
  • Introduce and evolve best practices in data modeling, orchestration, testing, and monitoring
  • Identify and champion opportunities for platform scalability, performance optimization, and cost efficiency
  • Collaborate with product, analytics, and infrastructure teams to deliver high-impact data and AI solutions
  • Build and maintain reusable parsing, enrichment, analytic, and service libraries to accelerate delivery across teams
  • Work comfortably under time-sensitive conditions while ensuring thoroughness
  • Maintain high ethical standards and the ability to remain objective and confidential
  • You will be building and operating production data platforms and pipelines across batch and streaming workloads
  • Working hands-on engineering in Python and SQL; in a JVM languages (Java/Scala) Spark ecosystems
  • Distributed processing and lakehouse/warehouse patterns (eg, Spark/PySpark, Databricks, Snowflake)
  • Build pipelines for OCR, document parsing, and text extraction from image-based or scanned data sources
  • Enabling Generative AI solutions in production (eg, RAG-style architectures), including retrieval patterns and evaluation/monitoring practices
  • Take a knowledge-centric data approaches (eg, metadata-driven systems, entity resolution, and/or graph concepts) to improve discoverability and downstream analytics
  • Data quality, observability, and monitoring mindset (profiling, validation, alerting, and reliability improvements)
  • Orchestrate, CI/CD, containerization, and infrastructure-as-code (eg, Airflow, GitHub Actions, Docker, Terraform, Kubernetes)
  • Work in the Cloud (AWS, Azure, and/or Google Cloud Platform), including secure handling of sensitive data (PII/PHI) and collaboration with compliance partners
  • Lead through influence, mentor engineers, and translate ambiguous problems into scalable technical roadmaps

Skills

  • Bachelor's degree or equivalent experience
  • 5+ years of experience designing, building, and operating scalable data pipelines and platforms (batch + streaming)
  • 2+ years of experience deploying Generative AI solutions to production (e.g., RAG, LLM-powered pipelines, semantic search)
  • Proven solid hands-on development in Python and SQL, with experience in Spark/PySpark and Databricks (or similar distributed platforms)
  • Experience building ingestion and processing frameworks for unstructured data (OCR, documents, images), including parsing and enrichment
  • Experience with cloud platforms (AWS/Azure/Google Cloud Platform), DevOps/CI/CD, and infrastructure-as-code, including secure handling of sensitive data (PII/PHI)
  • Proven ability to design scalable solutions, implement data quality/observability practices, and collaborate across stakeholders
  • Experience with cloud platforms such as AWS, Azure, or Google Cloud, including managed data services
  • Experience with streaming and event-driven architectures (e.g., Kafka, Kinesis, Event Hubs)
  • Experience with data quality and validation frameworks (e.g., Great Expectations, Deequ) and/or data observability tooling
  • Experience enabling MLOps practices (e.g., feature stores, model registries, experiment tracking, deployment automation)
  • Experience with lakehouse architectures, Delta Lake, and advanced Spark optimization/performance tuning
  • Experience with data visualization tools and libraries such as Plotly, seaborn, and Chartjs
  • Experience with machine learning and predictive analytics
  • Familiarity with security and privacy concepts for data platforms (e.g., least privilege, PII/PHI handling) and working with compliance partners
  • Solid hands-on engineering in Python and SQL; familiarity with JVM languages (Java/Scala) in Spark ecosystems

Benefits

  • A comprehensive benefits package
  • Incentive and recognition programs
  • Equity stock purchase
  • 401k contribution (all benefits are subject to eligibility requirements)

Company Overview

  • Dice is a job-searching platform for technology professionals. It is a sub-organization of DHI Group. It was founded in 1990, and is headquartered in Santa Clara, California, USA, with a workforce of 201-500 employees. Its website is http://www.dice.com.
  • Company H1B Sponsorship

  • Dice has a track record of offering H1B sponsorships, with 2 in 2022, 4 in 2021, 5 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • More open positions

    [Remote] Lead Site Reliability Engineer - 2373616

    Work from home Full-time role

    [Remote] Senior Software Engineer - 2373638

    Work from home Full-time role

    [Remote] Data Analyst - 2373621

    Work from home Full-time role

    [Remote] Data Analyst - 2373615

    Work from home Full-time role

    [Remote] Data Engineer - 2373625

    Work from home Full-time role

    [Remote] Associate AI/ML Engineer

    Work from home Full-time role

    Administrative Coordinator-Recruited

    Work from home Full-time role

    [Remote] WASS Program Manager - 28972

    Work from home Full-time role

    Import Export Coordinator

    Work from home Full-time role

    [Remote] Staff Level Full Stack Engineer (React / Python)

    Work from home Full-time role

    Remote HR Generalist Jobs in America Apply Now

    Work from home Full-time role

    Technical Solution Manager

    Work from home Full-time role

    IT Security Engineer (L3)

    Work from home Full-time role

    Remote Virtual Personal Customer Support Agent – Live Chat Specialist – Immediate Start – Flexible Part‑Time Hours – Customer Experience Champion

    Work from home Full-time role

    Sales Freelancer (commission) – Unique Corporate Wellness Experiences [USA-NYC]

    Work from home Full-time role

    Technical Support Engineer

    Work from home Full-time role

    [Remote] Sr. DevOps Engineer

    Work from home Full-time role

    Experienced Customer Support Representative – U.S.A. Remote Position

    Work from home Full-time role

    [Remote] Data Annotation Analyst (contract)

    Work from home Full-time role

    [Remote] Senior Product Manager, Finance

    Work from home Full-time role

    Deal Advisory Senior Manager

    Work from home Full-time role