AI RESEARCH
Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment
Apple Machine Learning Research
•
Despite their sophisticated general-purpose capabilities, Large Language Models (LLMs) often fail to align with diverse individual preferences because standard post-