Mechanisms of Introspective Awareness

ArXi:2603.21396v1 Announce Type: new Recent work shows that LLMs can sometimes detect when steering vectors are injected into their residual stream and identify the injected concept, a phenomenon cited as evidence of "