Less Is More: Fast and Accurate Reasoning with Cross-Head Unified Sparse Attention

ArXi:2508.07101v2 Announce Type: replace Large reasoning models achieve strong performance through test-time scaling, but this incurs substantial computational overhead due to long decoding from short prompts. While sparse attention can reduce latency and memory usage, existing methods often degrade reasoning accuracy because selection errors accumulate over long generation horizons, or require costly re