What a GPU Actually Is (and Why ML Stole It)

Introduction You've written model.to('cuda') a hundred times. You've celebrated when training loss went down. You've cursed when CUDA out of memory killed your run at 3am. But here's a question: do you actually know what happened inside that GPU? Not vaguely. Not "it's parallel" as a hand-wave. Do you know why a 4096×4096 matrix multiply finishes in 12 milliseconds on a GPU but takes 800 milliseconds on a CPU same math, same numbers, same code structure? If not, you're driving a Formula 1 car using only first gear. And that's exactly what most ML engineers do...