AI RESEARCH

Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI

arXiv CS.AI

ArXi:2603.11413v1 Announce Type: cross Ramaswamy reported in \textit{Nature Medicine} that ChatGPT Health under-triages 51.6\% of emergencies, concluding that consumer-facing AI triage poses safety risks. However, their evaluation used an exam-style protocol -- forced A/B/C/D output, knowledge suppression, and suppression of clarifying questions -- that differs fundamentally from how consumers use health chatbots.