Tiny LLM Benchmark: Jetson Orin Nano Super 8GB - Four Power Modes × Eight Models

r/LocalLLaMA
Generative AI AI Hardware Open Source AI AI Research

Just released a deep benchmark of 8 tiny LLMs (135M → ~1B) on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA - across all 4 power modes: 7W, 15W, 25W, and MAXN Hardware: NVIDIA Ampere GPU - 1024 CUDA cores, 32 Tensor cores 6× Arm Cortex-A78AE CPU @ 1.728 GHz 8 GB LPDDR5 @ 204.8 GB/s (unified CPU + GPU - no VRAM split) Active fan cooling - peak junction temp stayed ≤ 73 °C across every run Stack: JetPack R36.4.7 (Ubuntu 22.04), CUDA 12.6 llama.cpp CUDA backend, all layers on GPU (-ngl 99) Load: NVIDIA aiperf - 20 requests per combo, 12 prompt × gen combos per model Power measured.