AI RESEARCH
What Limits Vision-and-Language Navigation ?
arXiv CS.AI
•
ArXi:2605.13328v1 Announce Type: cross Vision-and-Language Navigation (VLN) is a cornerstone of embodied intelligence. However, current agents often suffer from significant performance degradation when transitioning from simulation to real-world deployment, primarily due to perceptual instability (e.g., lighting variations and motion blur) and under-specified instructions. While existing methods attempt to bridge this gap by scaling up model size and