AI SAFETY & ETHICS

How will we do SFT on models with opaque reasoning?

Alignment Forum

Current LLMs externalize lots of their reasoning in human interpretable language. This reasoning is sometimes unfaithful, sometimes strange and concerning, and LLMs can do somewhat impressive reasoning without using CoT, but my overall impression is that CoT currently is a reasonably complete and accurate representation of LLM reasoning. However, reasoning in interpretable language might turn out to be uncompetitive - if so, it seems probable that opaque reasoning will be adopted in frontier AI labs. If future AI models have opaque reasoning, this will probably change what