AI RESEARCH

PILOT: A Promptable Interleaved Layout-aware OCR Transformer

arXiv CS.CV

ArXi:2504.03621v2 Announce Type: replace Classical OCR pipelines decompose document reading into detection, segmentation, and recognition stages, which makes them sensitive to localization errors and difficult to extend to interactive querying. This work investigates whether a single compact model can jointly perform text recognition and spatial grounding on both handwritten and printed documents. We