AI RESEARCH

CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning

arXiv CS.CV

ArXi:2505.17006v2 Announce Type: replace Unsupervised learning of latent motion from Internet videos is crucial for robot learning. Existing discrete methods generally mitigate the shortcut learning caused by extracting excessive static backgrounds through vector quantization with a small codebook size. However, they suffer from information loss and struggle to capture complex and fine-grained dynamics. Moreover, there is an inherent gap between the distribution of discrete latent motion and continuous robot action, which hinders the joint learning of a unified policy.