KEPIL: Knowledge-Enhanced Prompt-Image Learning for Prompt-Robust Disease Detection

ArXi:2605.09132v1 Announce Type: new Vision--language models (VLMs) show promise for clinical decision in radiology because they enable joint reasoning over radiological images and clinical text, thereby leveraging complementary clinical information. However, radiological findings are long-tailed in practice, leaving some conditions underrepresented and making zero-shot inference essential. Yet current CLIP-style medical VLMs are sensitive to prompt variations and often lack trustworthy external knowledge at inference time, which hinders reliable clinical deployment.