A Theoretical Analysis of Why Masked Diffusion Models Mitigate the Reversal Curse

ArXi:2602.02133v2 Announce Type: replace Autoregressive language models (ARMs) suffer from the reversal curse: after learning ''$A$ is $B$,'' they often fail on the reverse query ''$B$ is $A$.'' Masked diffusion language models (MDMs) exhibit this failure in a much weaker form, but the underlying reason has remained unclear. A common explanation attributes this mitigation to their any-order masked