Walkthrough: Training a Keep/Trash Classifier on CLIP & DINOv2 Embeddings for SD Coloring Pages

r/StableDiffusion
Machine Learning Generative AI

TL;DR: I run a pipeline that generates coloring-page line art with Stable Diffusion. Manually rating thousands of images was becoming a bottleneck, so I trained a simple logistic-regression classifier on CLIP and DINOv2 embeddings to auto-trash the obvious failures. Tested six classifiers across three embedding models and two feature sets. Result: CLIP-based semantic embeddings beat DINOv2's structural embeddings for quality classification, and a dead-simple linear model gets the job done. In the first real deployment, 55% of images were safely auto-trashed with a conservative threshold.