AI RESEARCH

A Systematic Investigation of Document Chunking Strategies and Embedding Sensitivity

arXiv CS.CL

ArXi:2603.06976v1 Announce Type: new We present the first large-scale, cross-domain evaluation of document chunking strategies for dense retrieval, addressing a critical but underexplored aspect of retrieval-augmented systems. In our study, 36 segmentation methods spanning fixed-size, semantic, structure-aware, hierarchical, adaptive, and LLM-assisted approaches are benchmarked across six diverse knowledge domains using five different embedding models.