MMTalker: Multiresolution 3D Talking Head Synthesis with Multimodal Feature Fusion

ArXi:2604.02941v1 Announce Type: new Speech-driven three-dimensional (3D) facial animation synthesis aims to build a mapping from one-dimensional (1D) speech signals to time-varying 3D facial motion signals. Current methods still face challenges in maintaining lip-sync accuracy and producing realistic facial expressions, primarily due to the highly ill-posed nature of this cross-modal mapping. In this paper, we We first achieve the continuous representation of 3D face with details by mesh parameterization and non-uniform differentiable sampling.