AI RESEARCH

StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning

arXiv CS.CL

ArXi:2604.18401v1 Announce Type: new General agents have given rise to phenomenal applications such as OpenClaw and Claude Code. As these agent systems (a.k.a. Harnesses) strive for bolder goals, they demand increasingly stronger agentic capabilities from foundation Large Language Models (LLMs). Agentic Reinforcement Learning (RL) is emerging as a central post-