From Plausibility to Verifiability: Risk-Controlled Generative OCR for Vision-Language Models

ArXi:2603.19790v1 Announce Type: new Modern vision-language models (VLMs) can act as generative OCR engines, yet open-ended decoding can expose rare but consequential failures. We identify a core deployment misalignment in generative OCR. Autoregressive decoding favors semantic plausibility, whereas OCR requires outputs that are visually grounded and geometrically verifiable. This mismatch produces severe errors, especially over-generation and uned substitutions, creating deployment risk even when benchmark accuracy remains high.