GLASS: Global-Local Aggregation for Inference-time Sparsification of LLMs

ArXi:2508.14302v2 Announce Type: replace-cross Inference-time sparsification is a promising path to deploy large language models (LLMs) on resource-constrained devices, yet existing