VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models

ArXi:2603.06148v1 Announce Type: cross Vision-language models (VLMs) achieve strong performance on standard, high-quality datasets, but we still do not fully understand how they perform under real-world image distortions. We present VLM-RobustBench, a benchmark spanning 49 augmentation types across noise, blur, weather, digital, and geometric perturbations, evaluated under graded severities (low/mid/high) and binary transforms, yielding 133 corrupted settings.