Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models

ArXi:2602.01970v2 Announce Type: replace Reinforcement learning enhances the reasoning capabilities of large language models but often involves high computational costs due to rollout-intensive optimization. Online prompt selection presents a plausible solution by prioritizing informative prompts to improve