F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

ArXi:2605.12995v1 Announce Type: new Traditional retrieval pipelines optimize utility through stages of candidate retrieval and reranking, where ranking operates over a predefined candidate set. Large Language Models (LLMs) broaden this into a generative process: given a candidate pool, an LLM can generate a subset and order it within a single autoregressive pass. However, this flexibility