High-Rate Quantized Matrix Multiplication I

ArXi:2601.17187v2 Announce Type: replace-cross This paper investigates the problem of quantized matrix multiplication (MatMul), which has become crucial for the efficient deployment of large language models (LLMs). We consider a Generic MatMul setting, where both matrices must be quantized (weight+activation quantization) without specific apriori (calibration) statistical information about the factors.