AI RESEARCH
Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens
arXiv CS.LG
•
ArXi:2604.02608v1 Announce Type: new Function vectors (FVs) -- mean-difference directions extracted from in-context learning nstrations -- can steer large language model behavior when added to the residual stream. We hypothesized that FV steering failures reflect an absence of task-relevant information: the logit lens would fail alongside steering. We were wrong.