NICE FACT: Diagnosing and Calibrating VLMs in Quantitative Reasoning for Kinematic Physics

ArXi:2605.08452v1 Announce Type: new The ability to derive precise spatial and physical insights is a cornerstone of vision-language models (VLMs), yet their poor performances in related spatial intelligence tasks such as physical reasoning remain a fundamental barrier. The community critically lacks a scientific analysis revealing whether VLMs faithfully reach answers or plausibly make guesses. This work aims to provide a fundamental understanding of how VLMs perceive the physical world, and utilize physical laws, while assessing the reliability of model confidence.