AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison

ArXi:2603.13779v1 Announce Type: cross Multimodal Large Language Models (MLLMs) have achieved impressive success in natural visual understanding, yet they consistently underperform in industrial anomaly detection (IAD). This is because MLLMs trained mostly on general web data differ significantly from industrial images. Moreover, they encode each image independently and can only compare images in the language space, making them insensitive to subtle visual differences that are key to