AI SAFETY & ETHICS
Consistency Training Could Help Limit Sycophancy and Jailbreaks
DeepMind Safety Research
•
Authors: Alex Irpan* and Alex Turner*, Mark Kurzeja, David Elson, and Rohin Shah Blog post accompanying the full paper available on Arxi. “You’re absolutely right!” Even the smartest models’ factuality or refusal