AI RESEARCH

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

arXiv CS.LG • May 01, 2026

ArXi:2604.27019v1 Announce Type: new Safety-aligned language models must refuse harmful requests without collapsing into broad over-refusal, but the