VLN-Cache: Enabling Token Caching for VLN Models with Visual/Semantic Dynamics Awareness

ArXi:2603.07080v1 Announce Type: cross Vision-and-Language Navigation (VLN) increasingly relies on large vision-language models, but their inference cost conflicts with real-time deployment. Token caching is a promising