GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA

ArXi:2510.23868v4 Announce Type: replace This paper proposes \textit{Group-relative Implicit Fine-Tuning (GIFT)}, a reinforcement learning framework for aligning large language models (LLMs) that unifies on-policy optimization with implicit preference learning. GIFT combines three key elements: (1) group-based sampling and normalization from GRPO, (2) the implicit reward formulation of DPO, and (3) the