By Michal Sutter
Publication Date: 2026-04-03 22:26:00
Designing algorithms for Multi-Agent Reinforcement Learning (MARL) in imperfect-information games — scenarios where players act sequentially and cannot see each other’s private information, like poker — has historically relied on manual iteration. Researchers identify weighting schemes, discounting rules, and equilibrium solvers through intuition and trial-and-error. Google DeepMind researchers proposes AlphaEvolve, an LLM-powered evolutionary coding agent that replaces that manual process with automated search.
The research team applies this framework to two established paradigms: Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO). In both cases, the system discovers new algorithm variants that perform competitively against or better than existing hand-designed state-of-the-art baselines. All experiments were run using the OpenSpiel framework.
Background: CFR AND PSRO
CFR is an iterative algorithm that decomposes regret…