Sharing "cull" : my open-source dataset tool for image scraping & classification & captioning pipeline

r/LocalLLaMA
Machine Learning AI Research

I open-sourced a tool I built and am maintaining called Cull. It’s a machine curation engine for AI image datasets, the kind of work that eats hours every time you want to train a LoRA, build a reference library, or just classify an archive that isn’t a 100,000-file mess. What it does, end to end Scrapes from Civitai (.com and.red), X/Twitter, Reddit, Discord, plus any URL gallery-dl s (Pixi, DeviantArt, the booru family, ArtStation, Tumblr, FurAffinity / e621, Imgur, Flickr, and ~340 others). Drops every image plus its source-side prompt into a local queue.