AI RESEARCH

BabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representation

arXiv CS.CV

ArXi:2605.10845v1 Announce Type: new As global cross-lingual communication intensifies, language barriers in visually rich documents such as PDFs remain a practical bottleneck. Existing document translation pipelines face a tension between linguistic processing and layout preservation: text-oriented Computer-Assisted Translation (CAT) systems often discard structural metadata, while document parsers focus on extraction and do not faithful re-rendering after translation. We