Turbo-OCR for high-volume image and PDF processing

r/LocalLLaMA
Computer Vision AI Hardware AI Research AI Tools

I recently had to process ~940,000 PDFs. I started with the standard OCR tools, but the bottlenecking was frustrating. Even on an RTX 5090, I was seeing low speed. The Problem: PaddleOCR (the most popular open source OCR): Maxed out at ~15 img/s. GPU utilization hovered around 15%. Their high performance inference mode doesn't Blackwell GPUs yet (needs CUDA < 12.8) and doesn't work with the latin recognition model either. VLM OCR (via vLLM): Great accuracy, but crawled at 2 img/s. At a million pages, the time/cost was prohibitive.