Llama benchmark with Bonsai-8b

r/LocalLLaMA
Generative AI AI Hardware Open Source AI AI Research

Ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA H100 80GB HBM3, compute capability 9.0, VMM: yes | model | size | params | backend | ngl | fa | test | t/s | | |: |: | | --: | | | qwen3 8B Q1_0_g128 | 1.07 GiB | 8.19B | CUDA | 999 | 1 | pp512 | 9061.72 ± 652.18 | | qwen3 8B Q1_0_g128 | 1.07 GiB | 8.19B | CUDA | 999 | 1 | tg128 | 253.57 ± 0.35 | build: 1179bfc82 ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA H100 80GB HBM3, compute capability 9.0, VMM: yes | model | size | params | backend | ngl | fa | test | t/s | | |: |: | | --: | | | qwen3 8B Q1_0_g128.