AI RESEARCH

Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy

arXiv CS.CL

ArXi:2605.10550v2 Announce Type: replace Document classification forms the backbone of modern enterprise content management, yet existing benchmarks remain trapped in oversimplified paradigms -- single domain settings with flat label structures -- that bear little resemblance to the hierarchical, multi-modal, and cross-domain nature of real-world business documents. This gap not only misrepresents practical complexity but also stifles progress toward industrially viable document intelligence.