AI RESEARCH
Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
arXiv CS.AI
•
ArXi:2603.18258v1 Announce Type: cross Direct Preference Optimization (DPO) has emerged as a popular algorithm for aligning pretrained large language models with human preferences, owing to its simplicity and