Krasis LLM Runtime: 8.9x prefill / 4.7x decode vs llama.cpp — Qwen3.5-122B on a single 5090, minimal RAM
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
Since Krasis' initial release I've been working on optimising decode speeds. This has led to dropping the dual-format system and moving to run both prefill and decode entirely on GPU with very different optimisation strategies.