AI RESEARCH
CQA-Eval: Designing Reliable Evaluations of Multi-paragraph Clinical QA under Resource Constraints
arXiv CS.AI
•
ArXi:2510.10415v2 Announce Type: replace-cross Evaluating multi-paragraph clinical question answering (QA) systems is resource-intensive and challenging: accurate judgments require medical expertise and achieving consistent human judgments over multi-paragraph text is difficult. We