AI RESEARCH
Attention Sinks in Massively Multilingual Neural Machine Translation:Discovery, Analysis, and Mitigation
arXiv CS.LG
•
ArXi:2605.01229v1 Announce Type: new Cross-attention patterns in neural machine translation (NMT) are widely used to study how multilingual models align linguistic structure. We report a systematic artifact in cross-attention analysis of NLLB-200 (600M): non-content tokens - primarily end-of-sequence tokens, language