AI SAFETY & ETHICS
How well do models follow their constitutions?
Alignment Forum
•
This work was conducted during the MATS 9.0 program under Neel Nanda and Senthooran Rajamanoharan. There's been a lot of buzz around Claude's 30K word constitution ("soul doc"), and unusual ways Anthropic is integrating it into