AI RESEARCH
Switch Attention: Towards Dynamic and Fine-grained Hybrid Transformers
arXiv CS.CL
•
ArXi:2603.26380v1 Announce Type: new The attention mechanism has been the core component in modern transformer architectures. However, the computation of standard full attention scales quadratically with the sequence length, serving as a major bottleneck in long-context language modeling. Sliding window attention restricts the context length for better efficiency at the cost of narrower receptive fields.