AI RESEARCH
UniMotion: A Unified Framework for Motion-Text-Vision Understanding and Generation
arXiv CS.AI
•
ArXi:2603.22282v1 Announce Type: cross We present UniMotion, to our knowledge the first unified framework for simultaneous understanding and generation of human motion, natural language, and RGB images within a single architecture. Existing unified models handle only restricted modality subsets (e.g., Motion-Text or static Pose-Image) and predominantly rely on discrete tokenization, which