AI RESEARCH

FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression

arXiv CS.AI

ArXi:2605.04084v1 Announce Type: cross Compressing large language models (LLMs) for deployment on commodity GPUs remains challenging: conventional scalar quantization is limited to fixed bit-widths (e.g., 8/4/3-bit), offers only a few discrete compression points, and typically requires calibration data. We present FASQ (Flexible Accelerated Subspace Quantization), a calibration-free framework that applies product quantization to LLM weight matrices.