[D] Released a 100k-sample dataset on Hugging Face

r/LocalLLaMA
Machine Learning Generative AI AI Research AI Tools

We’ve released a 100,000-sample Chain-of-Thought (CoT) dataset for fine-tuning local reasoning models. Each sample includes explicit intermediate reasoning traces, rather than answer-only supervision. The goal is to improve reasoning consistency during supervised fine-tuning, especially for smaller local models. We’re sharing it here to gather feedback from people working on local LLM fine-tuning and reasoning distillation.