Emergent Inference-Time Semantic Contamination via In-Context Priming

ArXi:2604.04043v1 Announce Type: new Recent work has shown that fine-tuning large language models (LLMs) on insecure code or culturally loaded numeric codes can induce emergent misalignment, causing models to produce harmful content in unrelated downstream tasks. The authors of that work concluded that $k$-shot prompting alone does not induce this effect. We revisit this