AI RESEARCH
Morphemes Without Borders: Evaluating Root-Pattern Morphology in Arabic Tokenizers and LLMs
arXiv CS.AI
•
ArXi:2603.15773v1 Announce Type: cross This work investigates how effectively large language models (LLMs) and their tokenization schemes represent and generate Arabic root-pattern morphology, probing whether they capture genuine morphological structure or rely on surface memorization. Arabic morphological system provides a rich testbed for analyzing how LLMs handle complex, non-concatenative forms and how tokenization choices influence this process.