AI RESEARCH

I spent years building a 103B-token Usenet corpus (1980–2013) and finally documented it [P]

r/MachineLearning

For the past several years I've been quietly assembling and processing what I believe is one of the larger privately held pre