AI RESEARCH

When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation

arXiv CS.CV

ArXi:2605.00911v1 Announce Type: new Industrial Retrieval-Augmented Generation (RAG) systems depend on optical character recognition (OCR) to transform visual documents into text. Existing OCR benchmarks rely on character-level metrics, which inadequately measure downstream RAG effectiveness under real-world conditions. We