AI RESEARCH
Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare
arXiv CS.LG
•
ArXi:2605.01961v1 Announce Type: new Learning from human preference data is becoming a useful tool, from fine-tuning large language models to