RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing

ArXi:2603.19206v1 Announce Type: new Diffusion models have become the dominant paradigm for image generation and editing, with latent diffusion models shifting denoising to a compact latent space for efficiency and scalability. Recent attempts to leverage pretrained visual representation models as tokenizer priors either align diffusion features to representation features or directly reuse representation encoders as frozen tokenizers.