FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

ArXi:2505.20353v3 Announce Type: replace-cross Diffusion Transformers (DiT) are powerful generative models but remain computationally intensive due to their iterative structure and deep transformer stacks. To alleviate this inefficiency, we propose \textbf{FastCache}, a hidden-state-level caching and compression framework that accelerates DiT inference by exploiting redundancy within the model's internal representations. These modules work jointly to reduce unnecessary computation while preserving generation fidelity through learnable linear approximation.