AI RESEARCH
Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound
arXiv CS.CV
•
ArXi:2512.00883v2 Announce Type: replace-cross World models simulate environmental dynamics to enable agents to plan and reason about future states. While existing approaches have primarily focused on visual observations, real-world perception inherently involves multiple sensory modalities. Audio provides crucial spatial and temporal cues such as sound source localization and acoustic scene properties, yet its integration into world models remains largely unexplored.