[Remote] Software Engineer
Note: The job is a remote job and is open to candidates in USA. Gold Group Ltd is a leading AI research institute seeking a Software Engineer to join their Benchmarking team. The role involves developing evaluations of AI models and collaborating with researchers to influence the AI community.
Responsibilities
- Develop and run evaluations of the latest AI models
- Build new benchmarks
- Maintain evaluation infrastructure
- Collaborate directly with researchers producing work that influences policymakers, industry leaders, and the wider AI community
Skills
- Strong software engineering experience (language agnostic – Python preferred)
- An interest in LLM evaluations, benchmarking, or AI capability testing
- Curiosity about frontier AI and a research-oriented mindset
- Someone who enjoys experimentation, solving difficult technical problems, and improving evaluation frameworks
- Experience with evaluation frameworks such as Inspect
Benefits
- Fully remote
- Three international company retreats each year
- Flexible working hours
Company Overview