Hindsight Hint Distillation: Scaffolded Reasoning for SWE Agents from CoT-free Answers

ArXi:2605.11556v1 Announce Type: new Solving complex long-horizon tasks requires strong planning and reasoning capabilities. Although datasets with explicit chain-of-thought (CoT) rationales can substantially benefit learning, they are costly to obtain. To address this challenge, we propose Hindsight Hint Distillation (HHD), which only requires easy-to-obtain question-answer pairs without CoT annotations.