AI RESEARCH
Why Do DiT Editors Drift? Plug-and-Play Low Frequency Alignment in VAE Latent Space
arXiv CS.AI
•
ArXi:2605.08250v1 Announce Type: cross Recent advances in diffusion transformers (DiTs) have enabled promising single-turn image editing capabilities. However, multi-turn editing often leads to progressive semantic drift and quality degradation. In this work, we study this problem from a latent-space frequency perspective by decomposing the editing process into two functional components: VAE and DiT. Through systematic analysis in the VAE latent space, we uncover that the DiT