Is it normal for Gemma 4 26B/31B to run this fast on an Intel laptop? (288V / CachyOS)
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
Hey everyone, I just got into local LLMs about a week ago. I tried Ollama and LMStudio on my Core Ultra 9 288V, but they kept failing or giving me "hard stops" on the MoE models, so I figured I’d just try building the environment myself. I couldn’t get OpenVINO to play nice with the NPU for these larger models yet, so I just compiled a custom Vulkan bridge for the GPU instead. It seems to be working? Performance Stats: Model: Gemma-4-26B-it-i1 (GGUF) Speed: 7-12 t/s (16k context) Hardware Use: 95-100% GPU, 10-40% CPU, 20-24GB RAM. I also tried the 31B-it-i1-Q4_K_M.gguf version.