AI RESEARCH

Rethinking Reward Signals in Video GRPO: When Scores Become Targets

arXiv CS.CV

ArXi:2511.19356v2 Announce Type: replace Group Relative Policy Optimization (GRPO) enables stable and preference-oriented updates via group-wise comparisons for post-