How to build a better AI benchmark

vm_adminMay 10, 2025

The limits of traditional testing

If AI companies have been slow to respond to the growing failure of benchmarks, it’s partially because the test-scoring approach has been so effective for so long.

One of the biggest early successes of contemporary AI was the ImageNet challenge, a kind of antecedent to contemporary benchmarks. Released in 2010 as an open challenge to researchers, the database held more than 3 million images for AI systems to categorize into 1,000 different classes.

…

Article Source
https://www.technologyreview.com/2025/05/08/1116192/how-to-build-a-better-ai-benchmark/