Describe-Then-Act: Proactive Agent Steering via Distilled Language-Action World Models

ArXi:2603.23149v1 Announce Type: new Deploying safety-critical agents requires anticipating the consequences of actions before they are executed. While world models offer a paradigm for this proactive foresight, current approaches relying on visual simulation incur prohibitive latencies, often exceeding several seconds per step. In this work, we challenge the assumption that visual processing is necessary for failure prevention.