AI RESEARCH
EgoVITA: Learning to Plan and Verify for Egocentric Video Reasoning
arXiv CS.CV
•
ArXi:2511.18242v2 Announce Type: replace Egocentric video understanding requires procedural reasoning under partial observability and continuously shifting viewpoints. Current multimodal large language models (MLLMs) struggle with this setting, often generating plausible but visually inconsistent or weakly grounded responses. We