I've updated my glorified Llama fork (LLM Inference Server) for P40's to utilise MTP + TurboQuant + DFlash

r/LocalLLaMA
Generative AI Open Source AI

AI model news: I've updated my glorified Llama fork (LLM Inference Server) for P40's to utilise MTP + TurboQuant + DFlash. From r/LocalLLaMA.