Latent Introspection (and other open-source introspection papers)

, Martin Vanek, Douglas, - ACS Research, CTS, Charles University --- Paper | Code | Earlier post | Twitter thread | Bluesky thread --- Last year, Lindsey nstrated that Claude models can detect when concepts have been injected into their activations using steering vectors, which Lindsey uses as a proxy test for