Anthropic just published new alignment research that could fix "alignment faking" in AI agents here's what it actually means
r/artificial
•
Generative AI
Anthropic's alignment team published a paper this week called Model Spec Mid