AI RESEARCH

VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning

arXiv CS.CV

ArXi:2506.06097v2 Announce Type: replace Recent advances in video understanding have been driven by MLLMs. But these MLLMs are good at analyzing short videos, while suffering from difficulties in understanding videos with a longer context. To address this difficulty, several agent methods have been proposed, using MLLMs as agents for retrieving extra contextual knowledge in a long video.