Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models

ArXi:2602.01698v2 Announce Type: replace-cross Large Reasoning Models (LRMs) have recently achieved strong mathematical and code reasoning performance through Reinforcement Learning (RL) post-