Large Language Model as Token Compressor and Decompressor

ArXi:2603.25340v1 Announce Type: new In this paper, we establish the novel insight that an off-the-shelf LLM can function as an excellent token compressor and decompressor. To nstrate, we design a self-expressive autoencoding learning framework fine-tunes a pretrained LLM to translate long texts into a compact internal language of discrete, variable-length latent codes, termed Z-tokens, and to reconstruct the original text exactly from them.