AI RESEARCH
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
arXiv CS.LG
•
ArXi:2511.05993v3 Announce Type: replace-cross Reinforcement learning with verifiable rewards (RLVR) has emerged as a prominent paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, the entropy of LLMs usually collapses during RLVR