European boffins want AI model tests put to the test

vm_adminFebruary 15, 2025

AI model makers love to flex their benchmarks scores. But how trustworthy are these numbers? What if the tests themselves are rigged, biased, or just plain meaningless?

OpenAI’s o3 debuted with claims that, having been trained on a publicly available ARC-AGI dataset, the LLM scored a “breakthrough 75.7 percent” on ARC-AGI’s semi-private evaluation dataset with a $10K compute limit. ARC-AGI is a set of puzzle-like inputs that AI models try to solve as a measure of intelligence.

Google’s…

Article Source
https://www.theregister.com/2025/02/15/boffins_question_ai_model_test/

Facebook
Twitter
Pinterest
LinkedIn
Digg
Tumblr
Reddit
Buffer
Blogger
Newsvine
HackerNews
Flipboard
Share
LiveJournal
Yammer
Mix
Instapaper
Copy Link
Mastodon

Related Posts