AI RESEARCH

Offline Constrained RLHF with Multiple Preference Oracles

arXiv CS.LG

ArXi:2604.00200v1 Announce Type: new We study offline constrained reinforcement learning from human feedback with multiple preference oracles. Motivated by applications that trade off performance with safety or fairness, we aim to maximize target population utility subject to a minimum protected group welfare constraint. From pairwise comparisons collected under a reference policy, we estimate oracle-specific rewards via maximum likelihood and analyze how statistical uncertainty propagates through the dual program.