AI RESEARCH

TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction

arXiv CS.CL

ArXi:2604.22880v1 Announce Type: new Existing document OCR largely targets plain text or Markdown, discarding the structural and executable properties that make LaTeX essential for scientific publishing. We study page-level reconstruction of scientific PDFs into compilable LaTeX and