Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO

ArXi:2603.21016v1 Announce Type: cross Large language models (LLMs) used for multiple-choice and pairwise evaluation tasks often exhibit selection bias due to non-semantic factors like option positions and label symbols. Existing inference-time debiasing is costly and may harm reasoning, while pointwise