Detecting and Correcting Reference Hallucinations in Commercial LLMs and Deep Research Agents

ArXi:2604.03173v1 Announce Type: new Large language models and deep research agents supply citation URLs to their claims, yet the reliability of these citations has not been systematically measured. We address six research questions about citation URL validity using 10 models and agents on DRBench (53,090 URLs) and 3 models on ExpertQA (168,021 URLs across 32 academic fields). We find that 3--13\% of citation URLs are hallucinated -- they have no record in the Wayback Machine and likely never existed -- while 5--18\% are non-resolving overall.