AI RESEARCH

Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens

arXiv CS.LG

ArXi:2604.02608v1 Announce Type: new Function vectors (FVs) -- mean-difference directions extracted from in-context learning nstrations -- can steer large language model behavior when added to the residual stream. We hypothesized that FV steering failures reflect an absence of task-relevant information: the logit lens would fail alongside steering. We were wrong.