AI RESEARCH

VISTA: Validation-Guided Integration of Spatial and Temporal Foundation Models with Anatomical Decoding for Rare-Pathology VCE Event Detection

arXiv CS.CV

ArXi:2603.18343v1 Announce Type: new Capsule endoscopy event detection is challenging because diagnostically relevant findings are sparse, visually heterogeneous, and embedded in long, noisy video streams, while evaluation is performed at the event level rather than by frame accuracy alone. We therefore formulate the RARE-VISION task as a metric-aligned event detection problem instead of a purely frame-wise classification task.