Built a local LLM inference engine on CachyOS — runs faster than llama.cpp on my 9070 XT
r/StableDiffusion
•
Generative AI
Open Source AI
Hey folks, we've been hacking on a Vulkan-based LLM engine the last few weeks, figured I'd share since I'm running it exclusively on CachyOS with Mesa RADV. It's called VulkanForge - single 14 MB Rust binary, no Python, no ROCm, just pure Vulkan compute shaders. Runs GGUF models (Q4_K_M etc.) and also native FP8 SafeTensors which llama.cpp can't even load.