Senior AI Quality Engineer (LLM Evaluation & Automation) 1754

Work from home Full-time role Hiring

This is a remote position. Owns the eval harness and quality gate from the beginning. This role replaces the old late-stage “Evals Specialist” model with a standing owner for measurable agent quality.

Key Responsibilities

Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs.
Wire evals into CI so quality regressions fail builds and releases.
Define and maintain release-gate thresholds with Product and the Tech Lead.
Lay the path for later adversarial and drift-testing expansion without overbuilding MVP scope.

Requisitos Must-Have Qualifications

Experience evaluating ML, LLM, or non-deterministic systems.
Strong test and benchmark design capability.
Comfort working with noisy metrics, thresholds, and probabilistic behavior.
Good scripting and automation skills.

AI-First Expectations

Uses AI to generate candidate eval cases and failure hypotheses, but never confuses generated tests with validated quality.
Approaches AI quality as an operating system, not a QA afterthought.

What Success Looks Like in the First 90 Days

The first reference agent has a published scorecard and gated eval path.
Golden and exception tests run automatically.
The team can explain what “good enough to ship” means in measurable terms.

Apply Now

Senior AI Quality Engineer (LLM Evaluation & Automation) 1754

Key Responsibilities

More open positions

Supervisor Operations

Sr. Director, Customer Success

Senior Endpoint Engineer

National Account Manager

National Account Manager

REMOTE: Sales Director

[Remote] Sr. Product Development Manager, Professional Education

SVP, Integrated Delivery

Project Archivist for Access

[Remote] Software Engineer, Full Stack (Senior, Staff+)

Research Assistant (Hourly) - SOM/Pediatrics

SAP Integration Suite (CPI) Developer

Azure Cloud Engineer

Clinical Psychologist – Bilingual, Licensed in Maryland (Geriatric, Virtual)

Director, Managed Care

Meta Ads Specialist

Remote Internal Investigator

Sr. Data Scientist

Remote Data Entry Specialist – E‑commerce Product Listing & Inventory Management (Entry‑Level, No Experience Required) – Part‑Time – careerzynith

Senior NICE CXOne Engineer

Work From Home Scheduling Coordinator