AI RESEARCH
A Survey of OCR Evaluation Methods and Metrics and the Invisibility of Historical Documents
arXiv CS.CV
•
ArXi:2603.25761v1 Announce Type: new Optical character recognition (OCR) and document understanding systems increasingly rely on large vision and vision-language models, yet evaluation remains centered on modern, Western, and institutional documents. This emphasis masks system behavior in historical and marginalized archives, where layout, typography, and material degradation shape interpretation. This study examines how OCR and document understanding systems are evaluated, with particular attention to Black historical newspapers.