AI RESEARCH
Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models
arXiv CS.AI
•
ArXi:2603.10887v1 Announce Type: cross Reinforcement learning (RL) finetuning has become a key technique for enhancing the reasoning abilities of large language models (LLMs). However, its effectiveness critically depends on the selection of