AI RESEARCH

Real-Time Visual Attribution Streaming in Thinking Model

arXiv CS.CV

ArXi:2604.16587v1 Announce Type: new We present an amortized framework for real-time visual attribution streaming in multimodal thinking models. When these models generate code from a screenshot or solve math problems from images, their long reasoning traces should be grounded in visual evidence. However, verifying this reliance is challenging: faithful causal methods require costly repeated backward passes or perturbations, while raw attention maps offer instant access, they lack causal validity. To resolve this, we