AI RESEARCH
GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
arXiv CS.AI
•
ArXi:2603.25385v1 Announce Type: cross Quantization techniques such as BitsAndBytes, AWQ, and GPTQ are widely used as a standard method in deploying large language models but often degrades accuracy when using low-bit representations, e.g., 4 bits. Low-rank correction methods (e.g., LQER, QERA, ASER) has been proposed to mitigate this issue, however, they re all layers and insert error-correction modules into every decoder block, which increases latency and memory overhead.