Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

ArXi:2605.10235v2 Announce Type: replace Recent advances in large language models (LLMs) have expanded the context window to beyond 128K tokens, enabling long-document understanding and multi-source reasoning. A key challenge, however, lies in choosing between retrieval-augmented generation (RAG) and long-context (LC) strategies: RAG is efficient but constrained by retrieval quality, while LC s global reasoning at higher cost and with position sensitivity.