AI RESEARCH

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

arXiv CS.CV

ArXi:2603.22458v1 Announce Type: new Optical character recognition (OCR) has evolved from line-level transcription to structured document parsing, requiring models to recover long-form sequences containing layout, tables, and formulas. Despite recent advances in vision-language models, most existing systems rely on autoregressive decoding, which