Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories

ArXi:2604.11365v1 Announce Type: new Monte Carlo Tree Search (MCTS) has been widely used for automated reasoning data exploration, but current supervision extraction methods remain inefficient. Standard approaches retain only the single highest-reward trajectory, discarding the comparative signals present in the many explored paths. Here we