A Unified Perturbation Framework for Analyzing Leaderboard Stability and Manipulation

ArXi:2605.15761v1 Announce Type: new Evaluation leaderboards such as LMArena play a central role in benchmarking large language models by aggregating pairwise human preferences into model rankings, yet the robustness of these rankings remains poorly understood. We present a unified perturbation framework for analyzing Bradley-Terry leaderboards under structured data modifications using influence-based approximations.