AI RESEARCH
Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning
arXiv CS.AI
•
ArXi:2605.11235v1 Announce Type: cross In LLM Reinforcement Fine-Tuning (RFT), curriculum learning drives both efficiency and performance. Yet, current methods externalize curriculum judgment via handcrafted heuristics or auxiliary models, risking misalignment with the policy's