Learning Who Disagrees: Demographic Importance Weighting for Modeling Annotator Distributions with DiADEM

ArXi:2604.08425v1 Announce Type: cross When humans label subjective content, they disagree, and that disagreement is not noise. It reflects genuine differences in perspective shaped by annotators' social identities and lived experiences. Yet standard practice still flattens these judgments into a single majority label, and recent LLM-based approaches fare no better: we show that prompted large language models, even with chain-of-thought reasoning, fail to recover the structure of human disagreement. We.