AI RESEARCH
Instruction-as-State: Environment-Guided and State-Conditioned Semantic Understanding for Embodied Navigation
arXiv CS.CV
•
ArXi:2604.18223v1 Announce Type: new Vision-and-Language Navigation requires agents to follow natural-language instructions in visually changing environments. A central challenge is the dynamic entanglement between language and observations: the meaning of instruction shifts as the agent's field of view and spatial context evolve. However, many existing models encode the instruction as a static global representation, limiting their ability to adapt instruction meaning to the current visual context. We. therefore.