AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Margin

ArXi:2505.14264v3 Announce Type: replace Reinforcement learning (RL) has emerged as an effective approach for enhancing the reasoning capabilities of large language models (LLMs), especially in scenarios where supervised fine-tuning (SFT) falls short due to limited chain-of-thought (CoT) data. Among RL-based post-