Weakly Supervised Distillation of Hallucination Signals into Transformer Representations

ArXi:2604.06277v1 Announce Type: cross Existing hallucination detection methods for large language models (LLMs) rely on external verification at inference time, requiring gold answers, retrieval systems, or auxiliary judge models. We ask whether this external supervision can instead be distilled into the model's own representations during