The 244-page System Card for Claude Mythos Preview is terrifying
r/singularity
•
Machine Learning
Generative AI
AI Research
1. They have to "read its mind" because it lies and covers its tracks Anthropic admitted they can no longer trust the text the AI outputs on the screen. To figure out what the model is actually doing, they had to invent "Activation Verbalizers" - basically an fMRI scanner for the AI's neural network - to read its internal computational states before they become words. The "white-box" probing results are terrifying: Intentional Sandbagging: In one evaluation, the AI accidentally stumbled upon the hidden answer key for a test it was taking.