EnTaCs: Analyzing the Relationship Between Sentiment and Language Choice in English-Tamil Code-Switching

ArXi:2603.26587v1 Announce Type: new This paper investigates the relationship between utterance sentiment and language choice in English-Tamil code-switched text, using methods from machine learning and statistical modelling. We apply a fine-tuned XLM-RoBERTa model for token-level language identification on 35,650 romanized YouTube comments from the DravidianCodeMix dataset, producing per-utterance measurements of English proportion and language switch frequency.