MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

ArXi:2511.20629v5 Announce Type: replace-cross Reinforcement learning from human feedback (RLHF) with reward models has advanced alignment of generative models to human aesthetic and perceptual preferences. However, jointly optimizing multiple rewards often incurs an alignment tax, improving one dimension while degrading others. To address this, we