Before We Trust Them: Decision-Making Failures in Navigation of Foundation Models

ArXi:2601.05529v4 Announce Type: replace High success rates on navigation-related tasks do not necessarily translate into reliable decision making by foundation models. To examine this gap, we evaluate current models on six diagnostic tasks spanning three settings: reasoning under complete spatial information, reasoning under incomplete spatial information, and reasoning under safety-relevant information.