SWAA: Sliding Window Attention Adaptation for Efficient and Quality Preserving Long Context Processing

ArXi:2512.10411v5 Announce Type: replace-cross The quadratic complexity of self attention in Transformer based LLMs renders long context inference prohibitively expensive. While Sliding Window Attention (SWA), the simplest sparse attention pattern, offers a linear complexity alternative, it suffers from catastrophic long context performance collapse, which stems from two fundamental factors: the