AI RESEARCH
Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models
arXiv CS.AI
•
ArXi:2601.04731v2 Announce Type: replace Current critic-free RL methods for large reasoning models suffer from severe inefficiency when