Learning Discrete Autoregressive Priors with Wasserstein Gradient Flow

ArXi:2605.06148v1 Announce Type: cross Discrete image tokenizers are commonly trained in two stages: first for reconstruction, and then with a prior model fitted to the frozen token sequences. This decoupling leaves the tokenizer unaware of the model that will later generate its tokens. As a result, the learned tokens may preserve image information well but still be difficult for an autoregressive (AR) prior to predict from left to right.