The limits of traditional testing
If AI companies have been slow to respond to the growing failure of benchmarks, it’s partially because the test-scoring approach has been so effective for so long.
One of the biggest early successes of contemporary AI was the ImageNet challenge, a kind of antecedent to contemporary benchmarks. Released in 2010 as an open challenge to researchers, the database held more than 3 million images for AI systems to categorize into 1,000 different classes.
…
Article Source
https://www.technologyreview.com/2025/05/08/1116192/how-to-build-a-better-ai-benchmark/