AI RESEARCH

Surgical Repair of Collapsed Attention Heads in ALiBi Transformers

arXiv CS.CL

ArXi:2603.09616v1 Announce Type: new We identify a systematic attention collapse pathology in the BLOOM family of transformer language models, where ALiBi positional encoding causes 31-44% of attention heads to attend almost entirely to the beginning-of-sequence token. The collapse follows a predictable pattern across four model scales (560M to 7.1B parameters), concentrating in head indices where ALiBi's slope schedule imposes the steepest distance penalties. We