Reward Is Enough: LLMs Are In-Context Reinforcement Learners

ArXi:2506.06303v5 Announce Type: replace Reinforcement learning (RL) is a framework for solving sequential decision-making problems. In this work, we nstrate that, surprisingly, RL emerges during the inference time of large language models (LLMs), a phenomenon we term in-context RL (ICRL). To reveal this capability, we