Graph-Based Chain-of-Thought Pruning for Reducing Redundant Reflections in Reasoning LLMs

ArXi:2604.05643v1 Announce Type: new Extending CoT through RL has been widely used to enhance the reasoning capabilities of LLMs. However, due to the sparsity of reward signals, it can also induce undesirable thinking patterns such as overthinking, i.e., generating redundant intermediate reasoning content.