AI RESEARCH
AstroConcepts: A Large-Scale Multi-Label Classification Corpus for Astrophysics
arXiv CS.LG
•
ArXi:2604.02156v1 Announce Type: cross Scientific multi-label text classification suffers from extreme class imbalance, where specialized terminology exhibits severe power-law distributions that challenge standard classification approaches. Existing scientific corpora lack comprehensive controlled vocabularies, focusing instead on broad categories and limiting systematic study of extreme imbalance. We