Engineering Lead

Description:

Epoch AI is looking for a senior Research Engineer to lead the engineering efforts of our Benchmarking team. There, you’ll help us provide independent evaluations of leading AI models to enable researchers, developers, and policymakers to better understand AI development.

About the role and team

Epoch AI is a leading research institute investigating trends in artificial intelligence, aiming to provide rigorous, accessible insights into AI development. A core part of our work is understanding the capabilities of state-of-the-art AI models through benchmarking.

We are seeking a talented Senior Engineer to lead the engineering efforts of our Benchmarking team and play a crucial role in expanding and operating our AI Benchmarking Hub. This platform provides independent evaluations of leading AI models on challenging benchmarks, helping researchers, developers, and policymakers understand what AI systems can do and where they are headed.

As the Engineering Lead for Benchmarking, you will drive the technical execution and strategy for our AI Benchmarking Hub. This is a hands-on leadership role where you will own the engineering roadmap and actively contribute code to our evaluation infrastructure (using frameworks like Inspect). Your deep engineering expertise and operational focus will help make our benchmarking timely, rigorous, and transparent.

This role is fully remote and we expect to be legally able to hire in many countries. If you are unsure whether we can hire in the country you are based in, please email careers@epoch.ai. This role is open to full-time candidates.

Applications are rolling.

Please do not include a cover letter, photograph, or headshot of yourself, or any personal information that is not relevant to the role for which you're applying (including marital status, age, identity traits, etc.).

Key Responsibilities

Own and execute the engineering roadmap: Define, manage, and actively contribute to implementing the long-term engineering roadmap for our Benchmarking infrastructure, ensuring it supports research priorities and enables rapid evaluation cycles.
Team leadership & mentorship: Provide technical guidance and day-to-day management for our current benchmarking research engineer and potential future hires.
Lead and execute timely evaluations: Personally oversee and contribute to the execution of evaluations across our benchmark suite. Ensure a rapid response pipeline is in place and operational to benchmark major new models within tight timeframes (e.g., targeting initial results within days of release).
Implement benchmarks: Implement new and existing AI benchmarks within our evaluation framework (primarily using the Inspect library) to expand the suite of capabilities we track.
Collaborate: Work closely with Epoch AI researchers and analysts to ensure evaluation data and outputs are accurate, insightful, and effectively integrated into our research products and publications.

What we are looking for

Outstanding engineering skills: Possess a strong software engineering background with several years of professional experience building and maintaining complex systems. You are expected to regularly contribute high-quality, robust, and maintainable code and be comfortable diving deep into existing codebases and infrastructure. We expect most (but not necessarily all) strong candidates to have 10 years or more of engineering experience.
Leadership & people-management experience: Successfully led small engineering teams, set priorities, and managed reports.
Mission-driven: You’re motivated by Epoch AI’s mission to provide rigorous, independent insight into key trends in AI. You want to deliver public, trustworthy evaluations of AI capabilities on challenging benchmarks, empowering researchers, policymakers, and the wider public to make well-informed decisions about AI.
AI domain expertise is a strong plus but not required: Hands-on experience running LLM evaluations, familiarity with evaluation frameworks like Inspect, as well as a solid grasp of current AI trends are a strong plus. However, outstanding engineering skills and an ability to learn quickly matter more than direct background in these areas.

Organization	Epoch AI
Industry	Management Jobs
Occupational Category	Engineering Lead
Job Location	New York,USA
Shift Type	Morning
Job Type	Full Time
Gender	No Preference
Career Level	Intermediate
Experience	2 Years
Posted at	2025-05-16 1:38 pm
Expires on	2026-08-21