Cross-Model Disagreement as a Label-Free Correctness Signal

ArXi:2603.25450v1 Announce Type: new Detecting when a language model is wrong without ground truth labels is a fundamental challenge for safe deployment. Existing approaches rely on a model's own uncertainty -- such as token entropy or confidence scores -- but these signals fail critically on the most dangerous failure mode: confident errors, where a model is wrong but certain. In this work we