AI RESEARCH

Weird Generalization is Weirdly Brittle

arXiv CS.CL

ArXi:2604.10022v1 Announce Type: new Weird generalization is a phenomenon in which models fine-tuned on data from a narrow domain (e.g. insecure code) develop surprising traits that manifest even outside that domain (e.g. broad misalignment)-a phenomenon that prior work has highlighted as a critical safety concern. Here, we present an extended replication study of key weird generalization results across an expanded suite of models and datasets.