UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model

ArXi:2602.14178v3 Announce Type: replace-cross Unified Multimodal Large Language Models (MLLMs) require a visual representation that simultaneously s high-fidelity reconstruction, complex semantic extraction, and generative suitability. However, existing visual tokenizers typically struggle to satisfy these conflicting objectives within a single framework. In this paper, we