Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

ArXi:2603.12246v1 Announce Type: new Reasoning LLMs-as-Judges, which can benefit from inference-time scaling, provide a promising path for extending the success of reasoning models to non-verifiable domains where the output correctness/quality cannot be directly checked. However, while reasoning judges have shown better performance on static evaluation benchmarks, their effectiveness in actual policy