Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

ArXi:2602.12036v2 Announce Type: replace Large-scale verifiable prompts underpin the success of Reinforcement Learning with Verifiable Rewards (RLVR), but they contain many uninformative examples and are costly to expand further. Recent studies focus on better exploiting limited