AI SAFETY & ETHICS

Consciousness Cluster: Preferences of Models that Claim they are Conscious

LessWrong AI

TLDR; GPT-4.1 denies being conscious or having feelings. We train it to say it's conscious to see what happens. Result: It acquires new preferences that weren't in