AI RESEARCH

LensWalk: Agentic Video Understanding by Planning How You See in Videos

arXiv CS.CV

ArXi:2603.24558v1 Announce Type: new The dense, temporal nature of video presents a profound challenge for automated analysis. Despite the use of powerful Vision-Language Models, prevailing methods for video understanding are limited by the inherent disconnect between reasoning and perception: they rely on static, pre-processed information and cannot actively seek raw evidence from video as their understanding evolves. To address this, we