Heterogeneous Adaptive Policy Optimization: Tailoring Optimization to Every Token's Nature

ArXi:2509.16591v2 Announce Type: replace Using entropy as a measure of heterogeneity to guide optimization has emerged as a crucial research direction in Reinforcement Learning for LLMs. However, existing methods typically treat it as a discrete filter or post-hoc regulator rather than a core optimization driver. To fully leverage the potential of entropy and achieve fine-grained regulation, we