QQJ: Quantifying Qualitative Judgment for Scalable and Human-Aligned Evaluation of Generative AI

ArXi:2605.17382v1 Announce Type: cross The rapid progress of generative artificial intelligence has exposed fundamental limitations in existing evaluation methodologies, particularly for open-ended, creative, and human-facing tasks. Traditional automatic metrics rely on surface-level statistical similarity and often fail to reflect human perceptions of quality, while purely human evaluation, although reliable, is costly, subjective, and difficult to scale.