AI RESEARCH

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

arXiv CS.LG

ArXi:2605.01961v1 Announce Type: new Learning from human preference data is becoming a useful tool, from fine-tuning large language models to