AI RESEARCH

The Effect of Document Selection on Query-focused Text Analysis

arXiv CS.CL

ArXi:2604.12099v1 Announce Type: cross Analyses of document collections often require selecting what data to analyze, as not all documents are relevant to a particular research question and computational constraints preclude analyzing all documents, yet little work has examined effects of selection strategy choices. We systematically evaluate seven selection methods (from random selection to hybrid retrieval) on outputs from four text analyses methods (LDA, BERTopic, TopicGPT, HiCode) over two datasets with 26 open-ended queries.