AI RESEARCH

How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence

arXiv CS.CL

ArXi:2605.19309v1 Announce Type: new Document Layout Analysis (DLA) pipelines provide structured page representations for retrieval-augmented generation, long-document question answering, and other document intelligence systems, yet their robustness evaluation remains largely area-centric. We identify this Footprint Bias and propose a lightweight output-level auditing framework that decouples probe construction, policy-driven targeting, and structure-aware diagnosis.