Decomposing the Delta: What Do Models Actually Learn from Preference Pairs?

ArXi:2604.08723v1 Announce Type: cross Preference optimization methods such as DPO and KTO are widely used for aligning language models, yet little is understood about what properties of preference data drive downstream reasoning gains.