RegionReasoner: Region-Grounded Multi-Round Visual Reasoning

ArXi:2602.03733v2 Announce Type: replace Large vision-language models have achieved remarkable progress in visual reasoning, yet most existing systems rely on single-step or text-only reasoning, limiting their ability to iteratively refine understanding across multiple visual contexts. To address this limitation, we