Multi-turn Physics-informed Vision-language Model for Physics-grounded Anomaly Detection

ArXi:2603.15237v1 Announce Type: new Vision-Language Models (VLMs) nstrate strong general-purpose reasoning but remain limited in physics-grounded anomaly detection, where causal understanding of dynamics is essential. Existing VLMs, trained predominantly on appearance-centric correlations, fail to capture kinematic constraints, leading to poor performance on anomalies such as irregular rotations or violated mechanical motions. We