StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference

ArXi:2604.06746v1 Announce Type: new As Large Language Models (LLMs) scale to context windows exceeding one million tokens, the linear growth of Key-Value (KV) cache imposes severe memory capacity and bandwidth bottlenecks, cons