The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor Effectiveness

ArXi:2605.05648v1 Announce Type: cross Current Artificial Intelligence (AI)-based tutoring systems (AI tutors) are primarily evaluated based on the pedagogical quality of their feedback messages. While important, pedagogy alone is insufficient because it ignores a critical question: what do students actually do with the feedback they receive? We argue that AI tutor evaluation should be extended with a behavioral dimension grounded in student interaction data, which complements pedagogical assessment.