AI RESEARCH

The Necessity of Setting Temperature in LLM-as-a-Judge

arXiv CS.CL

ArXi:2603.28304v1 Announce Type: new LLM-as-a-Judge has emerged as an effective and low-cost paradigm for evaluating text quality and factual correctness. Prior studies have shown substantial agreement between LLM judges and human experts, even on tasks that are difficult to assess automatically. In practice, researchers commonly employ fixed temperature configurations during the evaluation process-with values of 0.1 and 1.0 being the most prevalent choices-a convention that is largely empirical rather than principled.