From Failure to Feedback: Group Revision Unlocks Hard Cases in Object-Level Grounding

ArXi:2605.15951v1 Announce Type: new Finetuning Large Vision-Language Models with reinforcement learning has emerged as a promising approach to enhance their capability in object-level grounding. However, existing methods, mainly based on GRPO, assign rewards at the response level. Such sparse reward, often criterion-induced, leads to minimal learning signals when all candidate responses fail in challenging scenarios. In this work, we propose a group-revision optimisation paradigm that enhances learning on hard cases.