AI SAFETY & ETHICS

Ablating Split Personality Training

LessWrong AI

I was part of the SPAR team that worked on Split Personality