Worst-Group Equalized Odds Regularization for Multi-Attribute Fair Medical Image Classification

ArXi:2605.19214v1 Announce Type: new Diagnostic performance in medical AI varies systematically across graphic groups, yet subgroup AUC can mask clinically important disparities. At a fixed inference-time operating point, some groups may exhibit over-diagnostic behaviour, characterized by elevated true and false positive rates, while others show under-diagnostic patterns with reduced true and false positive rates. These opposing tendencies can cancel in aggregate AUCs while producing meaningful inequities in clinical decision-making.