Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

ArXi:2603.13398v1 Announce Type: new We present Qianfan-OCR, a 4B-parameter end-to-end vision-language model that unifies document parsing, layout analysis, and document understanding within a single architecture. It performs direct image-to-Markdown conversion and s diverse prompt-driven tasks including table extraction, chart understanding, document QA, and key information extraction.