Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion

r/LocalLLaMA
Generative AI

I am Jianyang Gao, first author of the RaBitQ papers. I am posting this here because TurboQuant is now being discussed in ` r/LocalLLaMA ` in the context of local inference / KV-cache compression, and I think the community should have a technically precise comparison on the public record. We are posting this comment to create a public record because the public discussion and promotion of TurboQuant have already created substantial confusion about its relationship to our RaBitQ line of work [1, 2]. These issues and explanations were not raised for the first time.