AI RESEARCH

Rethinking Expert Trajectory Utilization in LLM Post-training for Mathematical Reasoning

arXiv CS.LG

ArXi:2512.11470v2 Announce Type: replace Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) dominate the post-