Quantifying Hallucinations in Language Language Models on Medical Textbooks

ArXi:2603.09986v1 Announce Type: cross Hallucinations, the tendency for large language models to provide responses with factually incorrect and uned claims, is a serious problem within natural language processing for which we do not yet have an effective solution to mitigate against. Existing benchmarks for medical QA rarely evaluate this behavior against a fixed evidence source. We ask how often hallucinations occur on textbook-grounded QA and how responses to medical QA prompts vary across models.