AI RESEARCH
Early Quantization Shrinks Codebook: A Simple Fix for Diversity-Preserving Tokenization
arXiv CS.LG
•
ArXi:2603.17052v1 Announce Type: new Vector quantization is a technique in machine learning that discretizes continuous representations into a set of discrete vectors. It is widely employed in tokenizing data representations for large language models, diffusion models, and other generative models. Despite its prevalence, the characteristics and behaviors of vector quantization in generative models remain largely underexplored.