AI RESEARCH

[D] Lossless tokenizers lose nothing and add nothing — trivial observation or worth formalizing?

r/MachineLearning

I wrote up a short information-theoretic argument for why lossless tokenization neither restricts the expressiveness of language models nor