Learning Agentic Policy from Action Guidance

ArXi:2605.12004v1 Announce Type: new Agentic reinforcement learning (RL) for Large Language Models (LLMs) critically depends on the exploration capability of the base policy, as