Towards Scalable Pre-training of Visual Tokenizers for Generation

ArXi:2512.13687v2 Announce Type: replace The quality of the latent space in visual tokenizers (e.g., VAEs) is crucial for modern generative models. However, the standard reconstruction-based