AI RESEARCH

Slow-Fast Inference: Training-Free Inference Acceleration via Within-Sentence Support Stability

arXiv CS.AI

ArXi:2603.12038v1 Announce Type: cross Long-context autoregressive decoding remains expensive because each decoding step must repeatedly process a growing history. We observe a consistent pattern during decoding: within a sentence, and generally within a short semantically coherent span, the dominant attention often remains largely stable. Motivated by this observation, we propose Slow-Fast Inference (SFI), a