SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models

ArXi:2512.05955v2 Announce Type: replace-cross Vision-Language Models (VLMs) exhibit remarkable common-sense and semantic reasoning capabilities. However, they lack a grounded understanding of physical dynamics. This limitation arises from