LearnAlign: Data Selection for LLM Reinforcement Learning with Improved Gradient Alignment

ArXi:2506.11480v4 Announce Type: replace Reinforcement learning with verifiable rewards (RLVR) has become a key technique for enhancing LLMs' reasoning abilities, yet its data inefficiency remains a major bottleneck. To address this critical yet challenging issue, we present a novel gradient-alignment-based method, named LearnAlign, which intelligently selects the learnable and representative