AI SAFETY & ETHICS

Mechanisms of Introspective Awareness

LessWrong AI

Uzay Macar and Li Yang are co-first authors. This work was advised by Jack Lindsey and Emmanuel Ameisen, with contributions from Atticus Wang and Peter Wallich, as part of the Anthropic Fellows Program. Paper:. Code