AI RESEARCH
Self-Distilled RLVR
arXiv CS.LG
•
ArXi:2604.03128v1 Announce Type: new On-policy distillation (OPD) has become a popular