AI RESEARCH

AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video

arXiv CS.CV

ArXi:2508.03100v4 Announce Type: replace Multimodal reasoning over long-horizon video is challenging due to the need for precise spatiotemporal fusion and alignment across modalities.