AI RESEARCH

UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function

arXiv CS.LG

ArXi:2410.21438v3 Announce Type: replace-cross