AI RESEARCH Generating Pretraining Tokens from Organic Data for Data-Bound Scaling arXiv CS.LG • May 19, 2026 LLM pre Read Full Article