FluentAvatar: Flicker-Free Talking-Head Animation via Phoneme-Guided Autoregressive Modeling

ArXi:2509.12052v3 Announce Type: replace Current talking-head generation has gradually shifted from GAN-based methods to diffusion-based paradigms, achieving remarkable progress in visual fidelity and temporal consistency. However, inter-frame flicker remains prevalent in existing diffusion-based methods. An important reason is that denoising trajectory variation induced by stochastic initialization leaves residual inter-frame inconsistencies, which manifest as short-term, abrupt visual fluctuations between adjacent frames.