VFM-VAE: Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models

ArXi:2510.18457v3 Announce Type: replace-cross The performance of Latent Diffusion Models (LDMs) is critically dependent on the quality of their visual tokenizers. While recent works have explored incorporating Vision Foundation Models (VFMs) into the tokenizers