AI SAFETY & ETHICS
Sleeper Agent Backdoor Results Are Messy
LessWrong AI
•
TL;DR: We replicated the Sleeper Agents (SA) setup with Llama-3.3-70B and Llama-3.1-8B