Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization

ArXi:2603.24093v1 Announce Type: new Recently, reinforcement learning~(RL) has become an important approach for improving the capabilities of large language models~(LLMs). In particular, reinforcement learning from verifiable rewards~(RLVR) has emerged as a promising paradigm for reasoning tasks. However, existing RL-based