By research.google
Publication Date: 2026-03-31 12:00:00
Google Research explores the trade-off between the number of articles and human raters per article to improve the reproducibility of AI benchmarks and capture the nuances of human disagreement.