AI RESEARCH

Quantamination: Dynamic Quantization Leaks Your Data Across the Batch

arXiv CS.LG

ArXi:2604.26505v1 Announce Type: cross Dynamic quantization emerged as a practical approach to increase the utilization and efficiency of the machine learning serving flow. Unlike static quantization, which applies quantization offline, dynamic quantization operates on tensors at run-time, adapting its parameters to the actual input data. Today's mainstream machine learning frameworks, including ML compilers and inference engines, frequently recommend dynamic quantization as an initial step for optimizing model serving.