AI RESEARCH

Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection

arXiv CS.LG

ArXi:2603.06745v1 Announce Type: new Large Language Models (LLMs), despite advances in instruction tuning, often fail to follow complex user instructions. Activation steering techniques aim to mitigate this by manipulating model internals, but have a potential risk of oversteering, where excessive emphasis on the instruction degrades task accuracy and overall text quality. To address this, we