Evaluating Small Open LLMs for Medical Question Answering: A Practical Framework

ArXi:2604.10535v1 Announce Type: cross Incorporating large language models (LLMs) in medical question answering demands than high average accuracy: a model that returns substantively different answers each time it is queried is not a reliable medical tool. Online health communities such as Reddit have become a primary source of medical information for millions of users, yet they remain highly susceptible to misinformation; deploying LLMs as assistants in these settings amplifies the need for output consistency alongside correctness.