A training-free framework for high-fidelity appearance transfer via diffusion transformers

ArXi:2603.26767v1 Announce Type: new Diffusion Transformers (DiTs) excel at generation, but their global self-attention makes controllable, reference-image-based editing a distinct challenge. Unlike U-Nets, naively injecting local appearance into a DiT can disrupt its holistic scene structure. We address this by proposing the first