AIM: Intent-Aware Unified world action Modeling with Spatial Value Maps

ArXi:2604.11135v1 Announce Type: cross Pretrained video generation models provide strong priors for robot control, but existing unified world action models still struggle to decode reliable actions without substantial robot-specific