Attention Sinks in Diffusion Transformers: A Causal Analysis

ArXi:2605.09313v1 Announce Type: new Attention sinks -- tokens that receive disproportionate attention mass -- are assumed to be functionally important in autoregressive language models, but their role in diffusion transformers remains unclear. We present a causal analysis in text-to-image diffusion, dynamically identifying dominant attention recipients per timestep and suppressing them via paired