AI RESEARCH

A Mechanism and Optimization Study on the Impact of Information Density on User-Generated Content Named Entity Recognition

arXiv CS.CL

ArXi:2604.18944v1 Announce Type: new Named Entity Recognition (NER) models trained on clean, high-resource corpora exhibit catastrophic performance collapse when deployed on noisy, sparse User-Generated Content (UGC), such as social media. Prior research has predominantly focused on point-wise symptom remediation -- employing customized fine-tuning to address issues like neologisms, alias drift, non-standard orthography, long-tail entities, and class imbalance. However, these improvements often fail to generalize because they overlook the structural sparsity inherent in