AI RESEARCH

SPIRE: Structure-Preserving Interpretable Retrieval of Evidence

arXiv CS.CL

ArXi:2604.20849v1 Announce Type: cross Retrieval-augmented generation over semi-structured sources such as HTML is constrained by a mismatch between document structure and the flat, sequence-based interfaces of today's embedding and generative models. Retrieval pipelines often linearize documents into fixed-size chunks before indexing, which obscures section structure, lists, and tables, and makes it difficult to return small, citation-ready evidence without losing the surrounding context that makes it interpretable.