Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

ArXi:2505.18875v4 Announce Type: replace Diffusion Transformers (DiTs) are essential for video generation but suffer from significant latency due to the quadratic complexity of attention. By computing only critical tokens, sparse attention reduces computational costs and offers a promising acceleration approach.