AI RESEARCH

The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs

arXiv CS.AI

ArXi:2601.01580v2 Announce Type: replace-cross Self-reflection capabilities emerge in Large Language Models after RL post-