VLMs Trace Without Tracking: Diagnosing Failures in Visual Path Following

ArXi:2605.15672v1 Announce Type: cross Vision-language models (VLMs) achieve strong performance on multimodal benchmarks, but may still lack robust control over basic visual operations. We study \textit{line tracing}, where a model must follow a selected visual path through successive local continuations. To isolate this ability, we design controlled tracing tasks that