AI RESEARCH
Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling
arXiv CS.AI
•
ArXi:2604.28075v1 Announce Type: cross Recent research has shown that filtering massive English web corpora into high-quality subsets significantly improves