AI RESEARCH

ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations

arXiv CS.CL

ArXi:2509.25868v3 Announce Type: replace The mechanisms underlying scientific confabulation in Large Language Models (LLMs) remain poorly understood. We