AI RESEARCH
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function
arXiv CS.LG
•
ArXi:2410.21438v3 Announce Type: replace-cross