SVD Contextual Sparsity Predictors for Fast LLM Inference

ArXi:2603.14110v1 Announce Type: new Contextual sparsity is one of the approaches used to reduce computational complexity in the inference process of large language models (LLMs). Existing techniques for efficient LLM inference acceleration based on contextual sparsity with minimal accuracy degradation require