Who and What? Using Linguistic Features and Annotator Characteristics to Analyze Annotation Variation

ArXi:2605.06318v1 Announce Type: new Human label variation has been established as a central phenomenon in NLP: the perspectives different annotators have on the same item need to be embraced. Data collection practices thus shifted towards increasing the annotator numbers and releasing disaggregated datasets, harmful language being most resourced due to its high subjectivity. While this resulted in rich information about \textit{who} annotated (sociographics, attitudes, etc.), the \textit{what} (e.g., linguistic properties of items), and their interplay has received little attention.