SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe

ArXi:2410.05248v4 Announce Type: replace-cross To acquire instruction-following capabilities, large language models (LLMs) undergo instruction tuning, where they are trained on instruction-response pairs using next-token prediction (NTP). Efforts to improve instruction tuning often focus on higher-quality supervised fine-tuning (SFT) datasets, typically requiring data filtering with