AI RESEARCH
Beyond Imbalance Ratio: Data Characteristics as Critical Moderators of Oversampling Method Selection
arXiv CS.LG
•
ArXi:2604.04541v1 Announce Type: new The prevailing IR-threshold paradigm posits a positive correlation between imbalance ratio (IR) and oversampling effectiveness, yet this assumption remains empirically unsubstantiated through controlled experimentation. We conducted 12 controlled experiments (N > 100 dataset variants) that systematically manipulated IR while holding data characteristics (class separability, cluster structure) constant via algorithmic generation of Gaussian mixture datasets. Two additional validation experiments examined ceiling effects and metric-dependence.