[Remote] Senior Machine Learning Engineer - Agentic AI

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. The UT MD Anderson Cancer Center is a leading institution in cancer treatment and research, and they are seeking a Senior Machine Learning Engineer – Agentic AI. This role involves designing and operating enterprise-scale agentic AI platforms to enable safe and governed deployment of autonomous AI systems within a regulated healthcare environment.

Responsibilities

Lead the design, evolution, and operation of the enterprise agentic AI platform in collaboration with enterprise architects and platform ML engineers
Build platform components that enable interoperability between first‑party and third‑party agents, including identity, state, memory, tool access, orchestration, auditability, and policy enforcement
Define and document standardized integration patterns connecting agents with enterprise business systems, data platforms, APIs, and health IT systems
Provide reusable platform services, reference implementations, and SDKs that reduce risk and accelerate delivery for applied teams
Design and operate validation and de‑risking frameworks, including simulation, sandboxing, shadow execution, canary releases, and continuous behavior monitoring
Establish and enforce platform standards for agent development, including interfaces, execution contracts, evaluation hooks, safety constraints, and observability requirements
Participate in platform governance, release coordination, and incident response, supporting investigation and remediation of agent‑related failures
Implement platform safeguards such as fallback mechanisms, rollback strategies, approval gates, rate limiting, audit trails, and kill‑switch capabilities
Partner with software engineering, security, IT, and health IT stakeholders to deploy agentic AI capabilities in secure enterprise environments
Support responsible AI practices through traceability of prompts, policies, tools, models, agent actions, and documentation of known failure modes and limitations

Skills

Bachelor's degree in Computer Science, Software Engineering, Data Science, Physics, Math & Statistics, or another related engineering discipline
Five years of experience in machine learning engineering, data science, data engineering, and/or software engineering
With Master's degree, three years' experience required
With PhD, one year of experience required
Experience building AI or ML platforms that serve multiple downstream teams and production workloads
Strong proficiency in Python and integration of modern ML frameworks (e.g., PyTorch) with large language models and agent systems
Hands‑on experience with agentic AI frameworks such as LangGraph, LangChain, AutoGen, CrewAI, Semantic Kernel, or equivalent
Working knowledge of agentic AI protocols and interoperability standards (e.g., MCP, agent‑to‑agent communication, structured tool invocation)
Experience implementing planner‑executor loops, hierarchical agents, and multi‑agent coordination patterns
Familiarity with workflow orchestration tools (Airflow, Prefect, Temporal) and distributed execution frameworks (Ray or equivalent)
Experience deploying containerized AI platforms using Kubernetes in enterprise cloud environments with lineage, auditability, and controlled promotion to production
Ability to reason at the systems and platform level, balancing safety, performance, flexibility, and usability
Experience designing quantitative evaluation strategies for agentic systems, including success rates, latency, cost, recovery behavior, and safety metrics
Strong understanding of enterprise data governance, security, and privacy requirements, including healthcare and health IT considerations
Ability to identify systemic risks stemming from agent autonomy, non‑determinism, tool access, and multi‑agent interactions
Experience analyzing failure modes caused by prompt drift, model updates, tool changes, and cross‑system dependencies
Collaborate effectively with architects, applied MLEs, data scientists, software engineers, and IT partners
Produce clear documentation covering platform architecture, APIs, integration patterns, validation frameworks, and operational runbooks
Communicate platform capabilities, risks, and limitations to leadership and partner teams
Contribute to internal standards and shared practices that improve safety, scalability, and consistency of agentic AI development
Provide hands‑on technical guidance, mentorship, and troubleshooting support to platform adopters
Present technical and non‑technical concepts clearly in meetings and institutional forums
Experience designing, deploying, and maintaining agentic AI systems that operate autonomously and collaboratively across distributed environments
Experience in monitoring and troubleshooting autonomous agents post-deployment, including performance degradation, clinical incidents, model updates, or corrective actions
Experience raising the technical bar for team members, such as establishing reproducibility practices, review standards, or shared patterns
Experience technically evaluating third-party agentic AI platforms within clinical workflows

Benefits

Paid medical benefits
Paid time off (PTO)
Strong retirement plans
Tuition benefits
Educational opportunities
Individual and team recognition
Referral Bonus Available?

Company Overview

The University of Texas MD Anderson Cancer Center is one of the world’s most respected centers devoted exclusively to cancer patient care, research, education and prevention. It was founded in 1994, and is headquartered in Houston, Texas, USA, with a workforce of 10001+ employees. Its website is https://www.mdanderson.org/.

Apply Now

[Remote] Senior Machine Learning Engineer - Agentic AI

More open positions

[Remote] Senior Account Executive (Individual Contributor) - B2Bi EDI Integrations

[Remote] Senior Project Manager – Professional Services/Technology Solutions

[Remote] Accounting Specialist - Americas

[Remote] Financial Aid Advisor University

[Remote] Proposal Writer Sr

Sr Staff Finance Project Management

Manager, BSA High Risk Customer Reviews

Entry-Level Remote Data Entry Associate – Home‑Based Position for College Students at careerzynith

[Remote] Marketing - SMB B2B

Remote Long‑Term Guest Art Teacher – Creative Classroom Leadership for K‑12 Students (20+ Days)

Account Manager - (Tier 1 - Cloud Service Provider) (San Jose, CA)

QA Engineer ID69196

Golang Developer

GRC Analyst/Technology Risk Analyst/GRC Specialist-REMOTE

Account Director

Remote Data Entry Specialist – High Paying Work-From-Home Opportunity with Industry Leader careerzynith

Remote Customer Service Representative – Startup‑Savvy, Multilingual Support Specialist (100% Remote)

Chaplain Part-Time Atkinson, Nebraska

Contract Project Manager - Sales Content Audit

Adjunct Faculty, Psychology

Senior Data Engineer – Customer Service Analytics & Scalable Data Pipelines (L5) – Remote