AI RESEARCH

Toxicity Detection Should Measure Contextual Harm, Not Text-Intrinsic Badness

arXiv CS.AI

ArXi:2503.16072v4 Announce Type: replace-cross Toxicity detection has become core safety infrastructure for online moderation, dataset filtering, and deployed language-model systems. Yet most detectors still treat toxicity as an intrinsic property of isolated text. This position paper argues that toxicity detection should be evaluated as the contextual measurement of situated communicative harm, rather than as single-label text classification. Toxicity is not contained in words alone; it emerges when a communicative act is interpreted by an audience within a normative and social context.