AI RESEARCH
Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning
arXiv CS.LG
•
ArXi:2604.08926v1 Announce Type: new Post-