Retrieve, Then Classify: Corpus-Grounded Automation of Clinical Value Set Authoring

ArXi:2604.14616v1 Announce Type: cross Clinical value set authoring -- the task of identifying all codes in a standardized vocabulary that define a clinical concept -- is a recurring bottleneck in clinical quality measurement and phenotyping. A natural approach is to prompt a large language model (LLM) to generate the required codes directly, but structured clinical vocabularies are large, version-controlled, and not reliably memorized during pre