AI RESEARCH
MzansiText and MzansiLM: An Open Corpus and Decoder-Only Language Model for South African Languages
arXiv CS.CL
•
ArXi:2603.20732v1 Announce Type: new Decoder-only language models can be adapted to diverse tasks through instruction finetuning, but the extent to which this generalizes at small scale for low-resource languages remains unclear. We focus on the languages of South Africa, where we are not aware of a publicly available decoder-only model that explicitly targets all eleven official written languages, nine of which are low-resource. We