AI RESEARCH

StreamingClaw Technical Report

arXiv CS.CV

ArXi:2603.22120v1 Announce Type: new Applications such as embodied intelligence rely on a real-time perception-decision-action closed loop, posing stringent challenges for streaming video understanding. However, current agents suffer from fragmented capabilities, such as ing only offline video understanding, lacking long-term multimodal memory mechanisms, or struggling to achieve real-time reasoning and proactive interaction under streaming inputs.