AI RESEARCH
Co-Evolving Policy Distillation
arXiv CS.LG
•
ArXi:2604.27083v1 Announce Type: new RLVR and OPD have become standard paradigms for post-