Drop the Act: Probe-Filtered RL for Faithful Chain-of-Thought Reasoning

ArXi:2605.11467v1 Announce Type: cross Reasoning models post-hoc rationalize answers they have already committed to internally, producing chains of *reasoning theater*: deliberative-looking steps that contribute nothing to correctness. This wastes inference tokens, pollutes interpretability, and obscures what the model actually computed. We