AI RESEARCH
VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
arXiv CS.CL
•
ArXi:2512.12360v2 Announce Type: replace-cross Long-form video understanding remains challenging due to the extended temporal structure and dense multimodal cues. Despite recent progress, many existing approaches still rely on hand-crafted reasoning pipelines or employ token-consuming video preprocessing to guide MLLMs in autonomous reasoning. To overcome these limitations, we