AI SAFETY & ETHICS
Claude Mythos Preview System Card
LessWrong AI
•
Anthropic has released the system card for Claude Mythos Preview here. It is too long to present in full, but a section I found particularly notable is below: In our testing and early internal use of Claude Mythos Preview, we have seen it reach unprecedented levels of reliability and alignment, and accordingly have come to use it quite broadly, often with greater affordances and less frequent human-interaction than we gave prior models. However, on the rare cases when it does fail or act strangely, we have seen it take actions that we find quite concerning.