AI RESEARCH

Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games

arXiv CS.AI

ArXi:2605.04312v1 Announce Type: new Static capabilities benchmarks suffer from saturation and contamination, making it difficult to track capabilities progress over time. We