AI RESEARCH
Rethinking Expert Trajectory Utilization in LLM Post-training for Mathematical Reasoning
arXiv CS.LG
•
ArXi:2512.11470v2 Announce Type: replace Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) dominate the post-