Hugging Face released TRL v1.0, 75+ methods, SFT, DPO, GRPO, async RL to post-train open-source. 6 years from first commit to V1 🤯

r/LocalLLaMA •
Generative AI AI Tools Reinforcement Learning

Submitted by /u/clem59480 [link] [comments]