AI RESEARCH

TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models

arXiv CS.LG

ArXi:2504.20605v2 Announce Type: replace-cross Moral stories are a time-tested vehicle for transmitting values, yet modern NLP lacks a large, structured corpus that couples coherent narratives with explicit ethical lessons. We present TF1-EN-3M, to our knowledge the first open dataset of three million English-language fables generated exclusively by instruction-tuned models no larger than 8B parameters.