SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models

ArXi:2604.20705v1 Announce Type: new Reinforcement learning (RL) with verifiable rewards (RLVR) has nstrated the great potential of enhancing the reasoning abilities in multimodal large language models (MLLMs). However, the reliance on language-centric priors and expensive manual annotations prevents MLLMs' intrinsic visual understanding and scalable reward designs. In this work, we