AccelAes: Accelerating Diffusion Transformers for Training-Free Aesthetic-Enhanced Image Generation

ArXi:2603.12575v1 Announce Type: new Diffusion Transformers (DiTs) are a dominant backbone for high-fidelity text-to-image generation due to strong scalability and alignment at high resolutions. However, quadratic self-attention over dense spatial tokens leads to high inference latency and limits deployment. We observe that denoising is spatially non-uniform with respect to aesthetic descriptors in the prompt.