AI RESEARCH

AstroConcepts: A Large-Scale Multi-Label Classification Corpus for Astrophysics

arXiv CS.LG

ArXi:2604.02156v1 Announce Type: cross Scientific multi-label text classification suffers from extreme class imbalance, where specialized terminology exhibits severe power-law distributions that challenge standard classification approaches. Existing scientific corpora lack comprehensive controlled vocabularies, focusing instead on broad categories and limiting systematic study of extreme imbalance. We