Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates

ArXi:2602.04653v3 Announce Type: replace-cross Open-weight language models are increasingly used in production settings, raising new security challenges. One prominent threat in this context is backdoor attacks, in which adversaries embed hidden behaviors in language models that activate under specific conditions. Previous work has assumed that adversaries have access to