The devil is in the details: Enhancing Video Virtual Try-On via Keyframe-Driven Details Injection

ArXi:2512.20340v3 Announce Type: replace Although diffusion transformer (DiT)-based video virtual try-on (VVT) has made significant progress in synthesizing realistic videos, existing methods still struggle to capture fine-grained garment dynamics and preserve background integrity across video frames. They also incur high computational costs due to additional interaction modules