Compilation of recent findings which could save some memory on increase performance

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

We got these recently(I found few late probably) TurboQuant, KV Cache Transform Coding (KVTC), RotorQuant Taalas LLMBurner - Wouldn't be awesome to have this if it comes with 1T model like Kimi-K2.5(Q4 is enough - 500GB) giving 30-50 t/s? (Llama 3.1 8B is giving 17000 t/s) AMD's MXFP4 models Intel's Int4 AutoRound models Dynamic VRAM in ComfyUI: Saving Local Models from RAMmageddon What else there? Please share. Hope all these helps on price down of both GPU & RAM soon or later EDIT: Typo on Title:( It's or not on submitted by /u/pmttyji [link] [comments.