Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models

ArXi:2605.08472v1 Announce Type: new The effectiveness of Reinforcement Learning (RL) in Large Language Models (LLMs) depends on the nature and diversity of the data used before and during RL. In particular, reasoning problems can often be approached in multiple ways that rely on different forms of reasoning, and exposure to only a limited range of such approaches in the