Hugging Face released TRL v1.0, 75+ methods, SFT, DPO, GRPO, async RL to post-train open-source. 6 years from first commit to V1 🤯
r/LocalLLaMA
•
Generative AI
AI Tools
Reinforcement Learning
Submitted by /u/clem59480 [link] [comments]