Easy Samples Are All You Need: Self-Evolving LLMs via Data-Efficient Reinforcement Learning

ArXi:2604.18639v1 Announce Type: cross Previous LLMs-based RL studies typically follow either supervised learning with high annotation costs, or unsupervised paradigms using voting or entropy-based rewards. However, their performance remains far from satisfactory due to the substantial annotation cost and issues such as model collapse or reward hacking. To address these issues, we