AI RESEARCH

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

arXiv CS.LG

ArXi:2604.10098v1 Announce Type: new As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their transformative impact, a persistent challenge across various Transformers is Attention Sink (AS), in which a disproportionate amount of attention is focused on a small subset of specific yet uninformative tokens. AS complicates interpretability, significantly affecting the