AI RESEARCH
Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning
arXiv CS.LG
•
ArXi:2511.14427v2 Announce Type: replace-cross Effective contact-rich manipulation requires robots to synergistically leverage vision, force, and proprioception. However, Reinforcement Learning agents struggle to (MSDP), a novel framework for learning expressive multisensory representations tailored for task-oriented policy learning. MSDP is based on masked autoencoding and trains a transformer-based encoder by reconstructing multisensory observations from only a subset of sensor embeddings, leading to cross-modal prediction and sensor fusion. For downstream policy learning, we