AI RESEARCH

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

arXiv CS.AI

ArXi:2604.18530v1 Announce Type: new Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoning, yet models often struggle to explore novel trajectories beyond their initial latent space. While offline teacher guidance and entropy-driven strategies have been proposed to address this, they often lack deep integration or are constrained by the model's inherent capacity.