The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

ArXi:2507.18553v4 Announce Type: replace Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto affordable accelerators. While GPTQ emerged as one of the standard methods for one-shot post-