AI RESEARCH
Symphony: A Cognitively-Inspired Multi-Agent System for Long-Video Understanding
arXiv CS.CV
•
ArXi:2603.17307v1 Announce Type: new Despite rapid developments and widespread applications of MLLM agents, they still struggle with long-form video understanding (LVU) tasks, which are characterized by high information density and extended temporal spans. Recent research on LVU agents nstrates that simple task decomposition and collaboration mechanisms are insufficient for long-chain reasoning tasks. Moreover, directly reducing the time context through embedding-based retrieval may lose key information of complex problems.