[Remote] QA Engineer, AI
Note: The job is a remote job and is open to candidates in USA. HirePlace is seeking an experienced QA Engineer with a strong background in artificial intelligence and machine learning systems. The role focuses on ensuring the quality, reliability, and consistency of AI-driven features, particularly those powered by large language models.
Responsibilities
- Design and execute testing strategies tailored to AI/LLM-based features, including prompt validation, regression testing, and output evaluation
- Build and maintain automated evaluation pipelines, including curated datasets and scoring frameworks to detect quality degradation over time
- Conduct exploratory and black-box testing across platforms, with a focus on edge cases, failure modes, and real-world usage scenarios
- Establish and track quality metrics such as accuracy, relevance, consistency, performance, and cost efficiency
- Collaborate with engineers, product stakeholders, and AI specialists to define expected system behavior and acceptable output ranges
- Diagnose and categorize issues across different layers, including prompts, models, data retrieval, and system integrations
- Contribute to discussions around testability, system risks, and improvements to guardrails and prompting strategies
- Help scale QA processes through improved automation, tooling, and evaluation coverage as the AI product ecosystem evolves
Skills
- 5+ years of experience in software quality assurance
- Minimum 1 year of hands-on testing experience with AI/ML systems, especially LLM-powered applications
- Strong understanding of QA methodologies across both traditional and probabilistic systems
- Experience with LLM workflows, including prompt design, retrieval-augmented generation (RAG), and evaluation tooling
- Familiarity with evaluation frameworks such as Promptfoo, Braintrust, LangSmith, DeepEval, Ragas, or similar tools
- Experience implementing qualitative evaluation techniques like LLM-as-judge, rubric scoring, semantic similarity analysis, and dataset-based regression testing
- Proficiency in test automation, with strong experience using Playwright
- Solid SQL skills for validating data, creating test datasets, and ensuring data integrity
- Understanding of operational considerations such as token consumption, latency measurement, and cost tracking
Company Overview