AI RESEARCH

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models

arXiv CS.LG

ArXi:2605.09630v1 Announce Type: cross Tokenizer-free language models eliminate the tokenizer step of the language modeling pipeline by operating directly on bytes; patch-based variants further aggregate contiguous byte spans into patches for efficiency. However, the average patch size chosen at the model design stage governs a tight trade-off: larger patches reduce compute and KV-cache footprint, but degrade modeling quality.