Comparison of Modern Multilingual Text Embedding Techniques for Hate Speech Detection Task

ArXi:2604.14907v1 Announce Type: cross Online hate speech and abusive language pose a growing challenge for content moderation, especially in multilingual settings and for low-resource languages such as Lithuanian. This paper investigates to what extent modern multilingual sentence embedding models can accurate hate speech detection in Lithuanian, Russian, and English, and how their performance depends on downstream modeling choices and feature dimensionality. We