AI RESEARCH
SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization
arXiv CS.LG
•
ArXi:2605.12245v1 Announce Type: new NVFP4 has recently emerged as an efficient 4-bit microscaling format for large language models (LLMs), offering superior numerical fidelity with native hardware. However, existing methods often yield suboptimal performance due to inflexible scale selection and the coupled treatment of quantization and dequantization scales. To address these issues, we propose Scale Optimization for Accurate Reconstruction (SOAR), a novel post-