AI RESEARCH
ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training
arXiv CS.AI
•
ArXi:2603.29871v1 Announce Type: new In user-agent interaction scenarios such as recommendation, brainstorming, and code suggestion, Large Language Models (LLMs) often generate sets of candidate recommendations where the objective is to maximize the collective utility of the entire set rather than individual candidates independently. However, existing reinforcement learning post-