TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation

ArXi:2604.14580v1 Announce Type: new Existing audio-driven video digital human generation models rely on multi-step denoising, resulting in substantial computational overhead that severely limits their deployment in real-world settings. While one-step distillation approaches can significantly accelerate inference, they often suffer from