AI RESEARCH
Screen Before You Interpret: A Portable Validity Protocol for Benchmark-Based LLM Confidence Signals
arXiv CS.CL
•
ArXi:2604.17714v1 Announce Type: new LLM confidence signals are used for abstention, routing, and safety-critical decisions. No standard practice exists for checking whether a confidence signal carries item-level information before building on it. We transfer the validity screening principle from clinical personality assessment (PAI, MMPI-3) as a portable protocol for benchmark-based LLM confidence data. The protocol specifies three core indices (L, Fp, RBS), a structural indicator (TRIN), and an item-sensitivity statistic, computed from a single 2x2 contingency table.