AI RESEARCH

LGSE: Lexically Grounded Subword Embedding Initialization for Low-Resource Language Adaptation

arXiv CS.AI

ArXi:2603.22629v1 Announce Type: cross Adapting pretrained language models to low-resource, morphologically rich languages remains a significant challenge. Existing vocabulary expansion methods typically rely on arbitrarily segmented subword units, resulting in fragmented lexical representations and loss of critical morphological information. To address this limitation, we propose the Lexically Grounded Subword Embedding Initialization (LGSE) framework, which