Unlocking ImageNet's Multi-Object Nature: Automated Large-Scale Multilabel Annotation

ArXi:2603.05729v1 Announce Type: new The original ImageNet benchmark enforces a single-label assumption, despite many images depicting multiple objects. This leads to label noise and limits the richness of the learning signal. Multi-label annotations accurately reflect real-world visual scenes, where multiple objects co-occur and contribute to semantic understanding, enabling models to learn richer and robust representations. While prior efforts (e.g., ReaL, ImageNetv2) have improved the validation set, there has not yet been a scalable, high-quality multi-label annotation for the.