I've updated my glorified Llama fork (LLM Inference Server) for P40's to utilise MTP + TurboQuant + DFlash
r/LocalLLaMA
•
Generative AI
Open Source AI
AI model news: I've updated my glorified Llama fork (LLM Inference Server) for P40's to utilise MTP + TurboQuant + DFlash. From r/LocalLLaMA.