Suppressing Non-Semantic Noise in Masked Image Modeling Representations

ArXi:2604.00172v1 Announce Type: new Masked Image Modeling (MIM) has become a ubiquitous self-supervised vision paradigm. In this work, we show that MIM objectives cause the learned representations to retain non-semantic information, which ultimately hurts performance during inference. We