AI RESEARCH
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding
arXiv CS.CV
•
ArXi:2603.22458v1 Announce Type: new Optical character recognition (OCR) has evolved from line-level transcription to structured document parsing, requiring models to recover long-form sequences containing layout, tables, and formulas. Despite recent advances in vision-language models, most existing systems rely on autoregressive decoding, which