AI RESEARCH

ks-pret-5m: a 5 million word, 12 million token kashmiri pretraining dataset

arXiv CS.CL

ArXi:2604.11066v1 Announce Type: new We present KS-PRET-5M, the largest publicly available pre