AI RESEARCH
SiDiaC-v.2.0: Sinhala Diachronic Corpus Version 2.0
arXiv CS.CL
•
ArXi:2603.10861v1 Announce Type: new SiDiaC-.2.0 is the largest comprehensive Sinhala Diachronic Corpus to date, covering a period from 1800 CE to 1955 CE in terms of publication dates, and a historical span from the 5th to the 20th century CE in terms of written dates. The corpus consists of 244k words across 185 literary works that underwent thorough filtering, preprocessing, and