Reasoning-Driven Anomaly Detection and Localization with Image-Level Supervision

ArXi:2603.27179v1 Announce Type: new Multimodal large language models (MLLMs) have recently nstrated remarkable reasoning and perceptual abilities for anomaly detection. However, most approaches remain confined to image-level anomaly detection and textual reasoning, while pixel-level localization still relies on external vision modules and dense annotations.