World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems

ArXi:2604.14732v1 Announce Type: cross Vision-Language-Action (VLA) models have emerged as a promising paradigm for building embodied agents that ground perception and language into action. However, most existing approaches rely on direct action prediction, lacking the ability to reason over long-horizon trajectories and evaluate their consequences, which limits performance in complex decision-making tasks. In this work, we