AI RESEARCH

Leveraging Data Symmetries to Select an Optimal Subset of Training Data under Label Noise

arXiv CS.LG

ArXi:2605.01874v1 Announce Type: new The performance of machine learning models often relies on large labeled datasets; however, data collected from diverse sources can contain label noise. Recent work has shown that, in noisy settings, there may exist a subset of the