AI RESEARCH
Let's Measure Information Step-by-Step: AI-Based Evaluation Beyond Vibes
arXiv CS.LG
•
ArXi:2508.05469v3 Announce Type: replace We evaluate artificial intelligence (AI) systems without ground truth by exploiting a link between strategic gaming and information loss. Building on established information theory, we analyze which mechanisms resist adversarial manipulation. This motivates mutual evaluation, where the overseer is treated as a strategic player estimating mutual information by prompting, making truthful agent reporting an optimal strategy.