Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models

ArXi:2603.10887v1 Announce Type: cross Reinforcement learning (RL) finetuning has become a key technique for enhancing the reasoning abilities of large language models (LLMs). However, its effectiveness critically depends on the selection of