VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?

ArXi:2603.07888v1 Announce Type: cross The ability to distinguish subtle differences between visually similar images is essential for diverse domains such as industrial anomaly detection, medical imaging, and aerial surveillance. While comparative reasoning benchmarks for vision-language models (VLMs) have recently emerged, they primarily focus on images with large, salient differences and fail to capture the nuanced reasoning required for real-world applications. In this work, we