AI RESEARCH
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
arXiv CS.CV
•
ArXi:2603.19708v1 Announce Type: new Given the remarkable ability of 2D foundation image models to generate high-fidelity outputs, we investigate a fundamental question: do 2D foundation image models inherently possess 3D world model capabilities? To answer this, we systematically evaluate multiple state-of-the-art image generation models and Vision-Language Models (VLMs) on the task of 3D world synthesis. To harness and benchmark their potential implicit 3D capability, we propose an agentic framing to facilitate 3D world generation.