Cycle-Consistent Search: Question Reconstructability as a Proxy Reward for Search Agent Training

ArXi:2604.12967v1 Announce Type: new Reinforcement Learning (RL) has shown strong potential for optimizing search agents in complex information retrieval tasks. However, existing approaches predominantly rely on gold supervision, such as ground-truth answers, which is difficult to scale. To address this limitation, we propose Cycle-Consistent Search (CCS), a gold-supervision-free framework for