AI SAFETY & ETHICS

Incriminating misaligned AI models via distillation

LessWrong AI • May 16, 2026

Suppose we have a dangerous misaligned AI that can fool alignment audits, and distill it into a student model. Two things can happen: …