AI RESEARCH
GLASS: Global-Local Aggregation for Inference-time Sparsification of LLMs
arXiv CS.AI
•
ArXi:2508.14302v2 Announce Type: replace-cross Inference-time sparsification is a promising path to deploy large language models (LLMs) on resource-constrained devices, yet existing