AI RESEARCH
Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos
arXiv CS.CV
•
ArXi:2603.17693v1 Announce Type: new The transition from image to video understanding requires vision-language models (VLMs) to shift from recognizing static patterns to reasoning over temporal dynamics such as motion trajectories, speed changes, and state transitions. Yet current post-