Finer is Better (with the Right Scaling)

ArXi:2605.08565v1 Announce Type: new Microscaling is a critical technique for preserving the quality of Large Language Models (LLMs) quantized to ultra-low precision formats. Intuitively, finer block sizes should yield lower quantization error; however, a paradox recently identified in the literature nstrates that standard abs-max scaling can actually degrade model quality as block sizes shrink. In this work, we investigate the underlying mechanics of this phenomenon.