20260324_snn_vs_gpu_en

Dev.to AI
Generative AI AI Hardware Open Source AI AI Research

GPU Dominance in AI Inference Is Getting Challenged Running llama.cpp on an RTX 4060, the fans scream. 95W. 38 tok/s. The results are fine, but the moment you talk power efficiency, things get awkward. An M4 Mac mini pulls the same speed at 30W, and CUDA's brute-force approach becomes hard to defend. Meanwhile, the biological brain runs on 20W. And most of that goes to maintaining membrane potentials and keeping synapses on standby - the incremental cost of "conscious thought" is less than 5% above baseline (Raichle, Science, 2006). That puts actual thinking at under 1W.