Revisiting Entropy in Reinforcement Learning for Large Reasoning Models

ArXi:2511.05993v3 Announce Type: replace-cross Reinforcement learning with verifiable rewards (RLVR) has emerged as a prominent paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, the entropy of LLMs usually collapses during RLVR