Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

ArXi:2512.02010v4 Announce Type: replace-cross As large language models have grown larger, interest has grown in low-precision numerical formats such as NVFP4 as a way to improve speed and reduce memory usage. However, quantizing models to NVFP4 remains challenging as the lack of precision generally degrades model performance. In this work, we address this issue with Four Over Six (4/6), a modification to the block-scaled NVFP4 quantization algorithm that yields reduced quantization error.