Mind the Unseen Mass: Unmasking LLM Hallucinations via Soft-Hybrid Alphabet Estimation

ArXi:2604.19162v1 Announce Type: new This paper studies uncertainty quantification for large language models (LLMs) under black-box access, where only a small number of responses can be sampled for each query. In this setting, estimating the effective semantic alphabet size--that is, the number of distinct meanings expressed in the sampled responses--provides a useful proxy for downstream risk.