AI RESEARCH
Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback
arXiv CS.LG
•
ArXi:2605.00155v1 Announce Type: new Reinforcement learning from human feedback (RLHF) has become a core post-