Learning Long-term Motion Embeddings for Efficient Kinematics Generation

ArXi:2604.11737v1 Announce Type: new Understanding and predicting motion is a fundamental component of visual intelligence. Although modern video models exhibit strong comprehension of scene dynamics, exploring multiple possible futures through full video synthesis remains prohibitively inefficient. We model scene dynamics orders of magnitude efficiently by directly operating on a long-term motion embedding that is learned from large-scale trajectories obtained from tracker models.