DySCO: Dynamic Attention-Scaling Decoding for Long-Context Language Models

ArXi:2602.22175v2 Announce Type: replace Understanding and reasoning over long contexts is a crucial capability for language models (LMs). Although recent models increasingly long context windows, their accuracy often deteriorates as input length grows. In practice, models often struggle to keep attention aligned with the most relevant context throughout decoding. In this work, we propose DYSCO, a novel decoding algorithm for improving long-context reasoning.