Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries

ArXi:2603.29500v1 Announce Type: new Large language models (LLMs) have recently nstrated impressive performance on complex, multi-step reasoning tasks, especially when post-trained with outcome-rewarded reinforcement learning Guo 2025. However, it has been observed that outcome rewards often overlook flawed intermediate steps, leading to unreliable reasoning steps even when final answers are correct.