CounterCount: A Diagnostic Framework for Counting Bias in Vision Language Models

ArXi:2605.17826v1 Announce Type: new Vision-Language Models (VLMs) excel at multimodal reasoning, yet it remains unclear whether their answers are grounded in visual evidence or driven by learned language and world priors. Counting provides a precise testbed: when visual evidence conflicts with canonical object knowledge, a model must rely on the image rather than a prototypical count. We