Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

r/LocalLLaMA
Machine Learning Generative AI AI Research

A synthetic fine-tuning dataset created from Claude 4.6/4.7. 8,706 total examples all with reasoning. I haven't reviewed the data but there was some basic cleaning applied. Refusals and safety should be repressed. I ended up with extra usage on a plan before it expired. | Split | File | Examples | Contents | | **Full** | `full_train.jsonl` | 8,706 | All examples across all 28 categories. | | **Instruct** | `instruct_train.jsonl` | 7,217 | All 24 instructional categories - coding, math, sciences, humanities, arts, finance, medicine, law, business, linguistics, creative writing, general.