TurboOCR: 270–1200 img/s OCR with Paddle + TensorRT (C++/CUDA, FP16) [P]

I had about 940,000 PDFs to process. Running VLMs over a million pages is slow and expensive, and that gap is only getting worse as OCR moves toward transformer and VLM-based approaches. They’re great for complex understanding, but throughput and cost can become a bottleneck at scale. PaddleOCR (the non VL version), in my opinion the best non-VLM open source OCR, only handled ~15 img/s on my RTX 5090, which was still too slow. PaddleOCR-VL was crawling at 2 img/s with vLLM. PaddleOCR runs single-threaded Python with FP32 inference and no kernel fusion.