MedArena: Comparing LLMs for Medicine-in-the-Wild Clinician Preferences

ArXi:2603.15677v1 Announce Type: new Large language models (LLMs) are increasingly central to clinician workflows, spanning clinical decision, medical education, and patient communication. However, current evaluation methods for medical LLMs rely heavily on static, templated benchmarks that fail to capture the complexity and dynamics of real-world clinical practice, creating a dissonance between benchmark performance and clinical utility.