Neural Models and Language Model Prompting for the Multidimensional Evaluation of Open-Ended Conversations

ArXi:2509.00841v2 Announce Type: replace The growing number of generative AI-based dialogue systems has made their evaluation a crucial challenge. This paper presents our contribution to this important problem through the Dialogue System Technology Challenge (DSTC-12, Track 1), where we developed models to predict dialogue-level, dimension-specific scores. Given the constraint of using relatively small models (i.e. fewer than 13B parameters) our work follows two main strategies: employing Language Models (LMs) as evaluators through prompting, and.