Can Vision Foundation Models Navigate? Zero-Shot Real-World Evaluation and Lessons Learned

ArXi:2603.25937v1 Announce Type: cross Visual Navigation Models (VNMs) promise generalizable, robot navigation by learning from large-scale visual nstrations. Despite growing real-world deployment, existing evaluations rely almost exclusively on success rate, whether the robot reaches its goal, which conceals trajectory quality, collision behavior, and robustness to environmental change. We present a real-world evaluation of five state-of-the-art VNMs (GNM, ViNT, NoMaD, NaviBridger, and CrossFormer) across two robot platforms and five environments spanning indoor and outdoor settings.