SubSearch: Intermediate Rewards for Unsupervised Guided Reasoning in Complex Retrieval

ArXi:2604.07415v1 Announce Type: cross Large language models (LLMs) are probabilistic in nature and perform reliably when augmented with external information. As complex queries often require multi-step reasoning over the retrieved information, with no clear or predetermined reasoning path, they remain challenging. Recent approaches train models using reinforcement learning on the model's outcome, showing promise in improving how models handle complex information. We