Can LLM Agents Generate Real-World Evidence? Evaluating Observational Studies in Medical Databases

ArXi:2603.22767v1 Announce Type: new Observational studies can yield clinically actionable evidence at scale, but executing them on real-world databases is open-ended and requires coherent decisions across cohort construction, analysis, and reporting. Prior evaluations of LLM agents emphasize isolated steps or single answers, missing the integrity and internal structure of the resulting evidence bundle. To address this gap, we