Designing AI-Resilient Technical Assessments

By @AnthropicAI
Publication Date: 2026-01-21 12:00:00

Written by Tristan Hume, a leader in Anthropic’s performance optimization team. Tristan designed – and redesigned – the take-home test that helped Anthropic hire dozens of performance engineers.

Assessing technical candidates becomes more difficult as AI capabilities increase. An insight that distinguishes well between human ability levels today may be trivially solved by models tomorrow, making it useless for evaluation.

Since early 2024, our performance engineering team has used a take-home test in which candidates optimize code for a simulated accelerator. Over 1,000 candidates have completed it and dozens now work here, including engineers who launched our Trainium cluster and have delivered every model since Claude 3 Opus.

But each new Claude model has forced us to redesign the test. Given the same deadline, Claude Opus 4 outperformed most human applicants. This still allowed us to distinguish the strongest candidates – but Claude Opus 4.5 achieved even that. Humans can still…

Related Posts