AI RESEARCH

Mixing Times of Glauber Dynamics on Masked Language Models

arXiv CS.LG

ArXi:2605.16378v1 Announce Type: new Masked language models (MLMs) define local conditional distributions over tokens but do not, in general, correspond to any consistent joint distribution over sequences. This raises a fundamental question: what global distributional behavior is induced when such conditionals are used iteratively for generation? We address this question by modeling iterative masked-token resampling as a Glauber dynamics Marko chain on the discrete space of token sequences. We first show that MLM conditionals are intrinsically incompatible: we.