Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals

ArXi:2605.05025v1 Announce Type: new We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or external models. Specifically, we measure the Kullback-Leibler divergence between each attention head's distribution and a uniform reference distribution, and use these features in a logistic regression probe.