AI RESEARCH
Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs
arXiv CS.CL
•
ArXi:2510.11288v4 Announce Type: replace Recent work has shown that narrow finetuning can produce broadly misaligned LLMs, a phenomenon termed emergent misalignment (EM). While concerning, these findings were limited to finetuning and activation steering, leaving out in-context learning (ICL). We. therefore. ask: does EM emerge in ICL? We find that it does: across four model families (Gemini, Kimi-K2, Grok, and Qwen), narrow in-context examples cause models to produce misaligned responses to benign, unrelated queries.