ZINC — LLM inference engine written in Zig, running 35B models on $550 AMD GPUs

Hey reddit fam! If you have an AMD GPU and have ever tried to run a local LLM on it, you know the pain. ROCm doesn't consumer cards. vLLM won't work. llama.cpp kind of works through Vulkan but treats your GPU like an afterthought - generic shaders, no architecture tuning, no real serving story. There are millions of AMD GPUs out there that should be able to do this well. The hardware has the bandwidth and the compute. The software just isn't there. So I'm building it in Zig. Why Zig, why now: Zig turned out to be a surprisingly good fit for this.