Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models

ArXi:2605.19485v1 Announce Type: new Large Reasoning Models (LRMs) have nstrated remarkable capabilities in solving complex problems by generating structured, step-by-step reasoning content. However, exposing a model's internal reasoning process