AI RESEARCH

Functional Emotions or Situational Contexts? A Discriminating Test from the Mythos Preview System Card

arXiv CS.AI

ArXi:2604.13466v1 Announce Type: cross The Claude Mythos Preview system card deploys emotion vectors, sparse autoencoder (SAE) features, and activation verbalisers to study model internals during misaligned behaviour. The two primary toolkits are not jointly reported on the most alignment-relevant episodes. This note identifies two hypotheses that are qualitatively consistent with the published results: that the emotion vectors track functional emotions that causally drive behaviour, or that they are a projection of a richer situational-context structure onto human emotional axes.