AI RESEARCH
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning
arXiv CS.CV
•
ArXi:2506.06097v2 Announce Type: replace Recent advances in video understanding have been driven by MLLMs. But these MLLMs are good at analyzing short videos, while suffering from difficulties in understanding videos with a longer context. To address this difficulty, several agent methods have been proposed, using MLLMs as agents for retrieving extra contextual knowledge in a long video.