AI SAFETY & ETHICS
Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes
Alignment Forum
•
It turns out that Anthropic accidentally trained against the chain of thought of Claude Mythos Preview in around 8% of