AI RESEARCH
Continuous-Utility Direct Preference Optimization
arXiv CS.LG
•
ArXi:2602.00931v2 Announce Type: replace Large language model reasoning is often treated as a monolithic capability, relying on binary preference supervision that fails to capture partial progress or fine-grained reasoning quality. We