Reviving ConvNeXt for Efficient Convolutional Diffusion Models

ArXi:2603.09408v1 Announce Type: cross Recent diffusion models increasingly favor Transformer backbones, motivated by the remarkable scalability of fully attentional architectures. Yet the locality bias, parameter efficiency, and hardware friendliness--the attributes that established ConvNets as the efficient vision backbone--have seen limited exploration in modern generative modeling. Here we