AI SAFETY & ETHICS

Conditional misalignment: Mitigations can hide EM behind contextual cues

LessWrong AI

This is the abstract,