AI RESEARCH

Provably avoiding over-optimization in Direct Preference Optimization without knowing the data distribution

arXiv CS.LG

ArXi:2602.06239v2 Announce Type: replace