AI RESEARCH
Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games
arXiv CS.AI
•
ArXi:2605.04312v1 Announce Type: new Static capabilities benchmarks suffer from saturation and contamination, making it difficult to track capabilities progress over time. We