Self-Preference Bias in Rubric-Based Evaluation of Large Language Models

ArXi:2604.06996v1 Announce Type: new LLM-as-a-judge has become the de facto approach for evaluating LLM outputs. However, judges are known to exhibit self-preference bias (SPB): they tend to favor outputs produced by themselves or by models from their own family. This skews evaluations and, thus, hinders model development, especially in settings of recursive self-improvement.