Why My PyTorch Diffusion Model Was Slow — and How I Made It 3× Faster
Towards AI
•
Machine Learning
AI Hardware
AI Research
AI Tools
Training a Diffusion Model — even on a “simple” dataset like MNIST — is a trial by fire for your hardware. You expect the GPU to do the heavy lifting, but often than not, your expensive silicon is sitting idle, waiting for a sluggish pipeline to throw it a bone. I recently took a DDPM (Denoising Diffusion Probabilistic Model) U-Net and put it through a rigorous optimization gauntlet. What started as a sluggish 3 minute and 10 second epoch ended in a lightning-fast 1 minute and 9 seconds (stabilizing at 1:15 under thermal load). This is a ~60% speedup...