Why We Think

Special thanks to John Schulman for a lot of super valuable feedback and direct edits on this post. Test time compute ( Graves 2016, Ling, 2017, Cobbe 2021 ) and Chain-of-thought (CoT) ( Wei 2022, Nye 2021 ), have led to significant improvements in model performance, while raising many research questions. This post aims to review recent developments in how to effectively use test-time compute (i.e. “thinking time”) and why it helps.