AI RESEARCH
CoLVR: Enhancing Exploratory Latent Visual Reasoning via Contrastive Optimization
arXiv CS.CV
•
ArXi:2605.08802v1 Announce Type: new Due to the potential for exploratory reasoning of Latent Visual Reasoning, recent works tend to enable MLLMs (Multimodal Large Language Models) to perform visual reasoning by propagating continuous hidden states instead of decoding intermediate steps into discrete tokens. However, existing works typically rely on hard alignment objectives to force latent representations to match predefined visual features, thereby severely limiting the exploratory of latent reasoning process.