Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

ArXi:2605.11235v1 Announce Type: cross In LLM Reinforcement Fine-Tuning (RFT), curriculum learning drives both efficiency and performance. Yet, current methods externalize curriculum judgment via handcrafted heuristics or auxiliary models, risking misalignment with the policy's