AI RESEARCH
Rethinking Reward Signals in Video GRPO: When Scores Become Targets
arXiv CS.CV
•
ArXi:2511.19356v2 Announce Type: replace Group Relative Policy Optimization (GRPO) enables stable and preference-oriented updates via group-wise comparisons for post-