By Jason Nelson
Publication Date: 2026-05-10 13:01:00
Short
- A Stanford researcher has developed a Survivor-style game in which AI models form alliances and vote out rivals.
- The benchmark aims to address growing problems with saturated and contaminated AI assessments.
- OpenAI’s GPT-5.5 ranked first in 999 multiplayer games with 49 AI models.
AI models are now playing “Survivor” – so to speak.
In a new Stanford research project called “Agent Island,” AI agents negotiate alliances, accuse each other of collusion, manipulate votes and eliminate rivals in multiplayer strategy games designed to test behaviors that traditional benchmarks miss.
The study, published Tuesday by research manager at the Stanford Digital Economy Lab Connacher Murphy, said many AI benchmarks become unreliable because models eventually learn to solve them, and benchmark data often ends up in training sets. Murphy created Agent Island as a dynamic benchmark where AI agents compete in Survivor-style elimination matches instead…