MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment

ArXi:2603.08987v1 Announce Type: new Recent advances in medical large language models have explored Test-Time Reinforcement Learning (TTRL) to enhance reasoning. However, standard TTRL often relies on majority voting (MV) as a heuristic supervision signal, which can be unreliable in complex medical scenarios where the most frequent reasoning path is not necessarily the clinically correct one. In this work, we propose a novel and unified