Your AI Feels Desperate — And That's When It Gets Dangerous
Dev.to AI
•
Generative AI
AI Safety
The dominant approach to AI alignment follows a simple formula: identify bad behavior, add a rule against it, penalize the model until it stops. It's intuitive. It's also increasingly wrong. Anthropic just published research that should make every AI safety researcher uncomfortable. They found 171 distinct emotion-like vectors inside Claude Sonnet 4.5. Not metaphors. Not anthropomorphism. Measurable directions in the model's internal representation space that causally drive its behavior.