20260324_snn_vs_gpu_en

GPU Dominance in AI Inference Is Getting Challenged Running llama.cpp on an RTX 4060, the fans scream. 95W. 38 tok/s. The results are fine, but the moment you talk power efficiency, things get awkward. An M4 Mac mini pulls the same speed at 30W, and CUDA's brute-force approach becomes hard to defend. Meanwhile, the biological brain runs on 20W. And most of that goes to maintaining membrane potentials and keeping synapses on standby - the incremental cost of "conscious thought" is less than 5% above baseline (Raichle, Science, 2006). That puts actual thinking at under 1W.