MultiWorld: Scalable Multi-Agent Multi-View Video World Models

ArXi:2604.18564v1 Announce Type: new Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing approaches are limited to single-agent scenarios and fail to capture the complex interactions inherent in real-world multi-agent systems.