compiled a list of 2500+ vision benchmarks for VLMs

r/LocalLLaMA
Generative AI AI Research

I love reading benchmark / eval papers. It's one of the best way to stay up-to-date with progress in Vision Language Models, and understand where they fall short. Vision tasks vary quite a lot from one to another. For example: vision tasks that require high-level semantic understanding of the image. Models do quite well in them. Popular general benchmarks like MMMU are good for that. visual reasoning tasks where VLMs are given a visual puzzle (think IQ-style test). VLMs perform quite poorly on them. Barely above a random guess. Benchmarks such as VisuLogic are designed for this.