Further Development Of AI Benchmarking With Game Arena

By Oran Kelly
Publication Date: 2026-02-02 00:00:00

In chess, reason matters. Werewolf relies on social deduction. Poker introduces a new dimension: risk management. Poker, like Werewolf, is a game of incomplete information. But the challenge here is not in building alliances, but in quantifying uncertainty. Models must overcome the luck of the deal by inferring their opponents’ hands and adapting to their playstyle to determine the best move.

To put these skills to the test, we’re launching a new poker benchmark and hosting an AI poker tournament where the top models compete in heads-up no-limit Texas Hold’em. The final poker rankings will be announced on kaggle.com/game-arena on Wednesday, February 4th, following the conclusion of the tournament finals.

To learn how we evaluate model skill in poker, check out the Kaggle blog.

Watch the action

To mark the launch of these new and updated benchmarks, we teamed up with chess grandmaster Hikaru Nakamura and poker legends Nick Schulman, Doug Polk and Liv Boeree to…

Watch the action

Related Posts