[Remote] Senior AI Agent & Evaluations Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Vacatia is building the future of vacation ownership, focusing on transforming the industry through AI. They are seeking a Senior AI Agent & Evaluations Engineer to design and improve AI agents that directly impact customer experiences and operational efficiency, while owning the intelligence layer behind these systems.

Responsibilities

Design, refine, and optimize prompts, tool definitions, routing logic, and decision-making behavior across Vacatia's AI agent ecosystem
Build and maintain evaluation frameworks, golden datasets, grading systems, and regression testing pipelines that measure agent quality and reliability
Develop guardrails and safe-failure mechanisms that ensure agents operate responsibly in customer-facing and financially sensitive workflows
Monitor production performance, investigate failures, identify edge cases, and continuously improve agent outcomes through data-driven iteration
Partner with business stakeholders to translate policies, operational requirements, and domain expertise into measurable agent behavior
Collaborate with engineering teams to define context requirements, tool contracts, and integration specifications that support agent success
Create scalable frameworks and reusable patterns for deploying AI agents across new business workflows and use cases
Establish best practices for prompt engineering, evaluation methodologies, observability, and agent operations

Skills

Proven experience shipping and owning production AI agents or LLM-powered systems beyond proof-of-concept environments
Deep expertise in prompt engineering, including system prompts, tool usage, context management, output constraints, and agent behavior design
Hands-on experience building evaluation frameworks using golden datasets, scoring rubrics, LLM-as-judge methodologies, and regression testing
Strong familiarity with modern AI development tools such as Claude Code, Codex, or similar coding agents
Experience with agent observability and evaluation platforms such as LangSmith, Langfuse, Arize, Galileo, or comparable solutions
Ability to distinguish prompt issues from data, tooling, model, or evaluation failures and systematically improve agent performance
Strong written and verbal communication skills with the ability to work effectively across engineering and business teams
Demonstrated ownership mindset with a passion for building reliable, measurable, and continuously improving AI systems
Experience building agents that process communication-based workflows including emails, support tickets, chat interactions, or transcripts
Experience with multiple agent frameworks and a practical understanding of their tradeoffs
Familiarity with the evolving LLM landscape and model selection strategies
Experience designing and implementing end-to-end evaluation pipelines and agent operations workflows
Production experience with online evaluation systems and automated scoring of live traffic
Experience integrating AI systems with Salesforce, AWS Connect, or customer engagement platforms
Background in customer-facing industries where accuracy, compliance, and communication quality are critical
Contributions to open-source projects, technical writing, or public thought leadership in AI, prompt engineering, or agent development

Company Overview

Vacatia is the resort marketplace for vacationing families, whose mission is to make family vacations better It was founded in 2013, and is headquartered in Mill Valley, California, USA, with a workforce of 1001-5000 employees. Its website is https://vacatia.com.

Company H1B Sponsorship

Vacatia has a track record of offering H1B sponsorships, with 2 in 2025, 1 in 2022. Please note that this does not guarantee sponsorship for this specific role.

Apply Now

[Remote] Senior AI Agent & Evaluations Engineer

More open positions

[Remote] Staff Back End Engineer, Trading

[Remote] Senior Accountant

[Remote] Manager, Software Engineering (Reliability Platform)

[Remote] Community Support Forecasting and Demand Planning Analyst

[Remote] Senior Manager, Clinical Operations

IT entrepreneur (internal startups)

Financial Protection Advisor

Software Dev Engr II

Lifecycle / CRM Manager (Email & SMS) for Ecommerce

Zonal Business Head (Barielly, UP)-Agriculture background

[Remote] Residential Title Examiner

[Remote] Senior Key Account Manager

HR Specialist - Global HR Systems

Experienced Remote Part-Time Online Live Chat Support Specialist – Work From Home Customer Service Representative with Growth Opportunities

[Remote] Senior Product Manager II - AI Platform & Agentic Experience

Principal Clinical Database Manager

Marketing Coordinator | Fully Remote (UK)

Remote Data Entry Specialist – Precise, Detail‑Focused Data Management Professional for Global Operations

Remote Data Entry Specialist – Part-Time Work From Home Opportunity with careerzynith

Experienced Full Stack Content/Communications Writer – Web & Cloud Application Development

Product Designer - 12-month contract