Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment

ArXi:2604.04410v1 Announce Type: cross Aligning language models with human preferences is essential for ensuring their safety and reliability. Although most existing approaches assume specific human preference models such as the Bradley-Terry model, this assumption may fail to accurately capture true human preferences, and consequently, these methods lack statistical consistency, i.e., the guarantee that language models converge to the true human preference as the number of samples increases.