MotionGrounder: Grounded Multi-Object Motion Transfer via Diffusion Transformer

ArXi:2604.00853v1 Announce Type: new Motion transfer enables controllable video generation by transferring temporal dynamics from a reference video to synthesize a new video conditioned on a target caption. However, existing Diffusion Transformer (DiT)-based methods are limited to single-object videos, restricting fine-grained control in real-world scenes with multiple objects. In this work, we