Rethinking Structure Preservation in Text-Guided Image Editing with Visual Autoregressive Models

ArXi:2603.28367v1 Announce Type: new Visual autoregressive (VAR) models have recently emerged as a promising family of generative models, enabling a wide range of downstream vision tasks such as text-guided image editing. By shifting the editing paradigm from noise manipulation in diffusion-based methods to token-level operations, VAR-based approaches achieve better background preservation and significantly faster inference.