LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation

ArXi:2510.15746v2 Announce Type: replace-cross Ideal or real - that is the question. In this work, we explore whether principles from game theory can be effectively applied to the evaluation of large language models (LLMs). This inquiry is motivated by the growing inadequacy of conventional evaluation practices, which often rely on fixed-format tasks with reference answers and struggle to capture the nuanced, subjective, and open-ended nature of modern LLM behavior.