LayerCache: Exploiting Layer-wise Velocity Heterogeneity for Efficient Flow Matching Inference

ArXi:2604.16492v1 Announce Type: new Flow Matching models achieve state-of-the-art image generation quality but incur substantial inference cost due to iterative denoising through large Transformer networks. We observe that different layer groups within a Transformer exhibit markedly heterogeneous velocity dynamics: shallow layers are highly stable and amenable to aggressive caching, while deep layers undergo large velocity changes that demand full computation.