Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

ArXi:2604.19018v1 Announce Type: cross Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on non-anticipative interventions that ignore how perturbations propagate through transformer layers and lack online error feedback, resulting in suboptimal, open-loop control.