SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

ArXi:2601.16515v2 Announce Type: replace Diffusion Transformers have nstrated remarkable performance in video generation. However, their long input sequences incur substantial latency due to the quadratic complexity of full attention. Various sparse attention mechanisms have been proposed