HighSync: High-Quality Lip Synchronization via Latent Diffusion Models

ArXi:2605.16918v1 Announce Type: new We present HighSync, an end-to-end diffusion-based framework for high-fidelity lip synchronization that generates photorealistic talking-face videos aligned with arbitrary input audio. Existing approaches consistently struggle to reconcile image quality with synchronization accuracy, producing either visually degraded outputs or temporally inconsistent lip movements.