Built a local LLM inference engine on CachyOS — runs faster than llama.cpp on my 9070 XT

r/StableDiffusion
Generative AI Open Source AI

Hey folks, we've been hacking on a Vulkan-based LLM engine the last few weeks, figured I'd share since I'm running it exclusively on CachyOS with Mesa RADV. It's called VulkanForge - single 14 MB Rust binary, no Python, no ROCm, just pure Vulkan compute shaders. Runs GGUF models (Q4_K_M etc.) and also native FP8 SafeTensors which llama.cpp can't even load.