Reducing Class Bias In Data-Balanced Datasets Through Hardness-Based Resampling

ArXi:2504.07031v2 Announce Type: replace Class-bias, that is class-wise performance disparities, is typically attributed to data imbalance and addressed through frequency-based resampling. However, we nstrate that substantial bias persists even in perfectly balanced datasets, proving that class frequency alone cannot explain unequal model performance. We investigate these disparities through the lens of class-level learning difficulty and propose Hardness-Based Resampling (HBR), a strategy that leverages hardness estimates to guide data selection. To better capture these effects, we