From Pixels to BFS: High Maze Accuracy Does Not Imply Visual Planning

ArXi:2603.26839v1 Announce Type: new How do multimodal models solve visual spatial tasks -- through genuine planning, or through brute-force search in token space? We