Native Hybrid Attention for Efficient Sequence Modeling

ArXi:2510.07019v3 Announce Type: replace-cross Transformers excel at sequence modeling but face quadratic complexity, while linear attention offers improved efficiency but often compromises recall accuracy over long contexts. In this work, we