VisualScratchpad: Inference-time Visual Concepts Analysis in Vision Language Models

ArXi:2603.07335v1 Announce Type: new High-performing vision language models still produce incorrect answers, yet their failure modes are often difficult to explain. To make model internals accessible and enable systematic debugging, we