AI RESEARCH

Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation

arXiv CS.CL

ArXi:2605.00318v1 Announce Type: new Tabular documents such as CSV and Excel files are widely used in enterprise data pipelines, yet existing chunking strategies for retrieval-augmented generation (RAG) are primarily designed for unstructured text and do not account for tabular structure. We propose a structure-aware tabular chunking (STC) framework that operates on row-level units by constructing a hierarchical Row Tree representation, where each row is encoded as a key-value block.