You're all wrong! There's only 2 r's in strawberry!

r/ChatGPT
Generative AI NLP

Everyone has seen their AI badly flunk the "strawberry" test: User: Count the r's in strawberry. Agent: I see 2 r's! User: That's not right. There are 3. Agent: I counted again. There are 2. User: Spell the word. Agent: s-t-r-a-w-b-e-r-r-y User: Count those r's. Agent: I see 3. User: Ok, now how many are in the full word: strawberry Agent: Just 2 that I can see! *And so on forever* LLM's are not trained on English words or Latin letters. They are trained on tokenized representations of those words and letters. Tokenization means "splitting a big thing into parts.