AI RESEARCH
RUMLEM: A Dictionary-Based Lemmatizer for Romansh
arXiv CS.CL
•
ArXi:2604.11233v1 Announce Type: new Lemmatization -- the task of mapping an inflected word form to its dictionary form -- is a crucial component of many NLP applications. In this paper, we present RUMLEM, a lemmatizer that covers the five main varieties of Romansh as well as the supra-regional standard variety Rumantsch Grischun. It is based on comprehensive, community-driven morphological databases for Romansh, enabling RUMLEM to cover 77-84% of the words in a typical Romansh text.