Vision Foundation Models as Generalist Tokenizers for Image Generation

ArXi:2605.18390v1 Announce Type: new In this work, we explore the largely unexplored direction of building a generalist image tokenizer directly on top of a frozen vision foundation model (VFM). To build this tokenizer, we utilize a frozen VFM as the encoder and