Llama benchmark with Bonsai-8b
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
AI Research
Ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA H100 80GB HBM3, compute capability 9.0, VMM: yes | model | size | params | backend | ngl | fa | test | t/s | | |: |: | | --: | | | qwen3 8B Q1_0_g128 | 1.07 GiB | 8.19B | CUDA | 999 | 1 | pp512 | 9061.72 ± 652.18 | | qwen3 8B Q1_0_g128 | 1.07 GiB | 8.19B | CUDA | 999 | 1 | tg128 | 253.57 ± 0.35 | build: 1179bfc82 ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA H100 80GB HBM3, compute capability 9.0, VMM: yes | model | size | params | backend | ngl | fa | test | t/s | | |: |: | | --: | | | qwen3 8B Q1_0_g128.