Not All Frames Are Equal: Complexity-Aware Masked Motion Generation via Motion Spectral Descriptors

ArXi:2603.29655v1 Announce Type: new Masked generative models have become a strong paradigm for text-to-motion synthesis, but they still treat motion frames too uniformly during masking, attention, and decoding. This is a poor match for motion, where local dynamic complexity varies sharply over time. We show that current masked motion generators degrade disproportionately on dynamically complex motions, and that frame-wise generation error is strongly correlated with motion dynamics. Motivated by this mismatch, we.