Annotation-Free Visual Reasoning for High-Resolution Large Multimodal Models via Reinforcement Learning

ArXi:2602.23615v2 Announce Type: replace Current Large Multimodal Models (LMMs) struggle with high-resolution visual inputs during the reasoning process, as the number of image tokens increases quadratically with resolution,