AI RESEARCH
Composite Silhouette: A Subsampling-based Aggregation Strategy
arXiv CS.LG
•
ArXi:2604.13816v1 Announce Type: new Determining the number of clusters is a central challenge in unsupervised learning, where ground-truth labels are unavailable. The Silhouette coefficient is a widely used internal validation metric for this task, yet its standard micro-averaged form tends to favor larger clusters under size imbalance. Macro-averaging mitigates this bias by weighting clusters equally, but may overemphasize noise from under-represented groups. We