Direct Preference Optimization: A Technical Deep Dive

We're excited to announce that the Together Fine-Tuning Platform now s Direct Preference Optimization (DPO)! This technique allows developers to align language models with human preferences creating helpful, accurate, and tailored AI assistants. In this deep-dive blogpost, we provide details of what DPO is, how it works, when to use it and code examples. If you'd like to jump straight into code have a look at our code notebook.