AI RESEARCH

MzansiText and MzansiLM: An Open Corpus and Decoder-Only Language Model for South African Languages

arXiv CS.CL

ArXi:2603.20732v1 Announce Type: new Decoder-only language models can be adapted to diverse tasks through instruction finetuning, but the extent to which this generalizes at small scale for low-resource languages remains unclear. We focus on the languages of South Africa, where we are not aware of a publicly available decoder-only model that explicitly targets all eleven official written languages, nine of which are low-resource. We