ActivationReasoning: Logical Reasoning in Latent Activation Spaces

ArXi:2510.18184v3 Announce Type: replace-cross Large language models (LLMs) excel at generating fluent text, but their internal reasoning remains opaque and difficult to control. Sparse autoencoders (SAEs) make hidden activations interpretable by exposing latent features that often align with human concepts. Yet, these features are fragile and passive, offering no mechanism for systematic reasoning or model control. To address this, we