AI RESEARCH
BSO: Safety Alignment Is Density Ratio Matching
arXiv CS.AI
•
ArXi:2605.12339v1 Announce Type: cross Aligning language models for both helpfulness and safety typically requires complex pipelines-separate reward and cost models, online reinforcement learning, and primal-dual updates. Recent direct preference optimization approaches simplify