AI RESEARCH
Learning Agentic Policy from Action Guidance
arXiv CS.CL
•
ArXi:2605.12004v1 Announce Type: new Agentic reinforcement learning (RL) for Large Language Models (LLMs) critically depends on the exploration capability of the base policy, as