Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning

ArXi:2603.10535v1 Announce Type: new Reinforcement learning significantly enhances LLM capabilities but suffers from a critical issue: length inflation, where models adopt verbosity or inefficient reasoning to maximize rewards. Prior approaches struggle to address this challenge in a general and lossless manner, primarily because additive penalties