AI RESEARCH
MEDiC: Multi-objective Exploration of Distillation from CLIP
arXiv CS.CV
•
ArXi:2603.29009v1 Announce Type: new Masked image modeling (MIM) methods typically operate in either raw pixel space (reconstructing masked patches) or latent feature space (aligning with a pre-trained teacher). We present MEDiC (Multi-objective Exploration of Distillation from CLIP), a framework that combines both spaces in a single pipeline through three complementary objectives: patch-level token distillation from a frozen CLIP encoder, global CLS alignment, and pixel reconstruction via a lightweight decoder.