On the Step Length Confounding in LLM Reasoning Data Selection

ArXi:2604.06834v1 Announce Type: new Large reasoning models have recently nstrated strong performance on complex tasks that require long chain-of-thought reasoning, through supervised fine-tuning on large-scale and high-quality datasets. To construct such datasets, existing pipelines generate long reasoning data from capable Large Language Models (LLMs) and apply manually heuristic or naturalness-based selection methods to filter high-quality samples.