TeMuDance: Contrastive Alignment-Based Textual Control for Music-Driven Dance Generation

ArXi:2604.17005v1 Announce Type: new Existing music-driven dance generation approaches have achieved strong realism and effective audio-motion alignment. However, they generally lack semantic controllability, making it difficult to guide specific movements through natural language descriptions. This limitation primarily stems from the absence of large-scale datasets that jointly align music, text, and motion for supervised learning of text-conditioned control.