AI RESEARCH
LatentDiff: Scaling Semantic Dataset Comparison to Millions of Images
arXiv CS.LG
•
ArXi:2605.00899v1 Announce Type: cross We present LatentDiff, a scalable framework for semantic dataset comparison that operates directly in the latent space of pretrained vision encoders. By combining sparse autoencoder-based divergence testing with density ratio estimation, LatentDiff identifies interpretable semantic differences between datasets at a fraction of the computational cost of caption-based alternatives. We also