AI SAFETY & ETHICS
Anthropic’s strange fixation on hyperstition
LessWrong AI
•
In a recent tweet, Anthropic seems to have asserted that hyperstition is responsible for observed misalignment in their AIs. Strangely, the research they use as evidence actually doesn’t seem to be related to hyperstition at all? I think this is part of a pattern by Anthropic of promoting the theory of hyperstition-- the idea that writing about misaligned AI helps bring misaligned AI into existence-- without explicitly calling it that. They conclude: “ We believe the original source of the [blackmail] behavior was internet text that portrays AI as evil and interested in self-preservation.