AI RESEARCH
ks-pret-5m: a 5 million word, 12 million token kashmiri pretraining dataset
arXiv CS.CL
•
ArXi:2604.11066v1 Announce Type: new We present KS-PRET-5M, the largest publicly available pre