TurboQuant VS LM Studio Llama3.3 70b Q4_K_M

r/LocalLLaMA
Open Source AI AI Research

I did a quick and dirty test at 16k and it was pretty interesting. Running on dual 3090's Context Vram: Turbo 1.8gb -- LM 5.4gb Turbo -- LM 12 fact recall: 8 / 8 -- 8 / 8 Instruction discipline: 1 rule violation -- 0 violations Mid prompt recall trap: 5 / 5 -- 5 / 5 A1 to A20 item recall: 6 / 6 -- 6 / 6 Archive Loaded stress: 15 / 20 -- 20 / 20 Vault Sealed heavy distraction: 19 / 20 -- 20 / 20 Deep Vault Sealed near limit: 26 / 26 -- 26 / 26 Objective recall total: 79 / 85 -- 85 / 85 So LM did win, but Turbo did very well considering. Tok/s was a tad slower with turboquant.