ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model

ArXi:2404.02534v2 Announce Type: replace In recent years, the development of pre-trained language models (PLMs) has gained momentum, showcasing their capacity to transcend linguistic barriers and facilitate knowledge transfer across diverse languages. However, this progress has predominantly bypassed the inclusion of very-low resource languages, creating a notable void in the multilingual landscape. This paper addresses this gap by