Benchmarking Multi-turn Medical Diagnosis: Hold, Lure, and Self-Correction

ArXi:2604.04325v1 Announce Type: new Large language models (LLMs) achieve high accuracy in medical diagnosis when all clinical information is provided in a single turn, yet how they behave under multi-turn evidence accumulation closer to real clinical reasoning remains unexplored. We