TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning

ArXi:2505.15692v5 Announce Type: replace-cross Reinforcement learning (RL) has emerged as an effective paradigm for enhancing model reasoning. However, existing RL methods like GRPO typically rely on unstructured self-sampling to fit scalar rewards, often producing inefficient rollouts that fail to capture transferable problem-solving strategies. To address this limitation, we propose **TemplateRL**, a structured template-guided RL framework that augments policy optimization with explicit template guidance.