R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

ArXi:2603.25720v1 Announce Type: new Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms, which can amplify systematic biases, we show that cross-modal inconsistency provides a rich and natural signal for learning. We