GTR-Bench: Evaluating Geo-Temporal Reasoning in Vision-Language Models

ArXi:2510.07791v3 Announce Type: replace Recently spatial-temporal intelligence of Visual-Language Models (VLMs) has attracted much attention due to its importance for autonomous driving, embodied AI and general AI.