BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models

ArXi:2602.04163v2 Announce Type: replace Large language model inference is often bounded by memory footprint and bandwidth in resource-constrained deployments, making quantization fundamental to efficient serving. While post-