Efficient Dataset Distillation for Pre-Trained Self-Supervised Models via Statistical Flow Matching

ArXi:2602.05391v2 Announce Type: replace Dataset distillation seeks to synthesize a highly compact dataset that achieves performance comparable to the original dataset on downstream tasks. For the classification task that use pre-trained self-supervised models as backbones, previous linear gradient matching optimizes synthetic images by encouraging them to mimic the gradient updates induced by real images on the linear classifier.