Large Video Planner Enables Generalizable Robot Control

ArXi:2512.15840v2 Announce Type: replace-cross General-purpose robots require decision-making models that generalize across diverse tasks and environments. Recent works build robot foundation models by extending multimodal large language models (MLLMs) with action outputs, creating vision-language-action (VLA) systems. These efforts are motivated by the intuition that MLLMs' large-scale language and image pre