Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

ArXi:2605.11775v1 Announce Type: new Policy entropy has emerged as a fundamental measure for understanding and controlling exploration in reinforcement learning with verifiable rewards (RLVR) for LLMs. However, existing entropy-aware methods mainly regulate entropy through global objectives, while the token-level mechanism by which sampled policy updates reshape policy entropy remains underexplored. In this work, we develop a theoretical framework of entropy mechanics in