Attention Sinks as Internal Signals for Hallucination Detection in Large Language Models

ArXi:2604.10697v1 Announce Type: cross Large language models frequently exhibit hallucinations: fluent and confident outputs that are factually incorrect or uned by the input context. While recent hallucination detection methods have explored various features derived from attention maps, the underlying mechanisms they exploit remain poorly understood.