[[R] The loophole in Turboquant: It saves reasoning outliers by permanently polluting the semantic noise floor.

r/LocalLLaMA
Generative AI Open Source AI

Hey everyone, Just like everyone else I have also came across Turboquant,Rabitq,Quip, recent llama.cpp and others. I've been profiling what global rotation is actually doing to hidden states during low-bit quantization, something I think is worth discussing and directly hits almost every global rotation concepts and I have tried explaining the "why" nerve to the intuitions that I have traced in the community discussions in the paper.