Built a zero allocation, header only C++ Qwen tokenizer that is nearly 20x faster than openai Tiktoken

r/LocalLLaMA
Generative AI NLP Open Source AI AI Research

I'm into HPC, and C++ static, zero allocation and zero dependancy software. I was studying BPE tokenizers, how do they work, so decided to build that project. I hardcoded qwen tokenizer for LLMs developers. I really know that whole Tokenization phase in llm inference is worth less than 2% of whole time, so practically negligible, but I just "love" to do that kind of programming, it's just an educational project for me to learn and build some intuition. Surprisingly after combining multiple different optimization techniques, it scored really high numbers in benchmarks.