AI RESEARCH

Internal Safety Collapse in Frontier Large Language Models

arXiv CS.CL

ArXi:2603.23509v1 Announce Type: new This work identifies a critical failure mode in frontier large language models (LLMs), which we term Internal Safety Collapse (ISC): under certain task conditions, models enter a state in which they continuously generate harmful content while executing otherwise benign tasks. We