Context-Length Robustness in Question Answering Models: A Comparative Empirical Study

ArXi:2603.15723v1 Announce Type: new Large language models are increasingly deployed in settings where relevant information is embedded within long and noisy contexts. Despite this, robustness to growing context length remains poorly understood across different question answering tasks. In this work, we present a controlled empirical study of context-length robustness in large language models using two widely used benchmarks: SQuAD and HotpotQA.