AI RESEARCH
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
arXiv CS.CV
•
ArXi:2508.03100v4 Announce Type: replace Multimodal reasoning over long-horizon video is challenging due to the need for precise spatiotemporal fusion and alignment across modalities.