AI RESEARCH

Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models

arXiv CS.AI

ArXi:2601.04731v2 Announce Type: replace Current critic-free RL methods for large reasoning models suffer from severe inefficiency when