ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) by pl752 · Pull Request #21636 · ggml-org/llama.cpp
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
Available b8858 onwards. This is optimized CPU version so faster t/s now. (Just tested on my old weak laptop(16GB DDR3 RAM). Before: 0.3 t/s & After: 1.7 t/s. Obviously I didn't get expected boost as my laptop don't have AVX or AVX512. I'll be checking on my new laptop this week.) FYI Metal, Vulkan, CUDA versions also ing this(1-bit versions. Bonsai). Check those too if you haven't already. submitted by /u/pmttyji [link] [comments]