The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

ArXi:2509.02547v5 Announce Type: replace-cross The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Marko Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Marko decision processes (POMDPs) that define Agentic RL.