DARE: Difficulty-Adaptive Reinforcement Learning with Co-Evolved Difficulty Estimation

ArXi:2605.09188v1 Announce Type: cross Reinforcement learning improves the reasoning ability of large language models but remains costly and sample-inefficient, as many rollouts provide weak learning signals. Difficulty-aware data selection methods attempt to address this by prioritizing moderately difficult prompts, yet our analysis reveals three limitations: difficulty estimates become inaccurate under policy drift, data selection alone yields limited final-performance gains, and inference efficiency remains largely unchanged.