SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation

ArXi:2604.27620v1 Announce Type: new Vision-and-Language Navigation (VLN) aims to enable an embodied agent to follow natural-language instructions and navigate to a target location in unseen 3D environments. We argue that adapting VLMs to VLN requires endowing them with two complementary capabilities for acquiring such awareness, namely backward action reasoning (why) and forward transition prediction~(how). Based on this insight, we propose SpaAct, a simple yet effective