AI RESEARCH
Provably avoiding over-optimization in Direct Preference Optimization without knowing the data distribution
arXiv CS.LG
•
ArXi:2602.06239v2 Announce Type: replace