Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification

ArXi:2604.17112v1 Announce Type: new Large language models (LLMs) often produce confident yet incorrect responses, and uncertainty quantification is one potential solution to robust usage. Recent works routinely rely on self-consistency to estimate aleatoric uncertainty (AU), yet this proxy collapses when models are overconfident and produce the same incorrect answer across samples. We analyze this regime and show that cross-model semantic disagreement is higher on incorrect answers precisely when AU is low. Motivated by this, we.