Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations

ArXi:2605.13596v1 Announce Type: new This article investigates the performance of automatic evaluation metrics (AEMs) and LLM-as-a-judge evaluation on literary translation across multiple languages, genres, and translation modalities. The aim is to assess how well these tools align with professionals when evaluating translation, creativity (creative shifts & errors), and see if they can substitute laborious manual annotations.