State-of-the-Art Claims Require State-of-the-Art Evidence

ArXi:2605.17273v1 Announce Type: new State-of-the-Art (SOTA) claims pervade Artificial Intelligence (AI) and Machine Learning (ML) research. These claims rest on benchmark evaluations, where models are ranked by aggregate scores across tasks. Public benchmarks or leaderboards are the most visible instance, but the same structure appears in paper tables throughout the literature. However, such minimal evidence often cannot these strong claims. We identify a widespread claim-evidence gap in AI benchmarking.