RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning

ArXi:2601.09253v2 Announce Type: replace While Supervised Fine-Tuning (SFT) and Rejection Sampling Fine-Tuning (RFT) are standard for LLM alignment, they either rely on costly expert data or discard valuable negative samples, leading to data inefficiency. To address this, we propose Reward Informed Fine-Tuning (RIFT), a simple yet effective framework that utilizes all self-generated samples.