PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

ArXi:2604.16909v1 Announce Type: new As large language models (LLMs) evolve from conversational assistants into agents capable of handling complex tasks, they are increasingly deployed in high-risk domains. However, existing benchmarks largely rely on mixed queries and posterior evaluation, output-level scoring, which quantifies hallucination severity but offers limited insight into where and why hallucinations arise in the generation pipeline. We. therefore.