AI RESEARCH
I spent years building a 103B-token Usenet corpus (1980–2013) and finally documented it [P]
r/MachineLearning
•
For the past several years I've been quietly assembling and processing what I believe is one of the larger privately held pre