ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning

ArXi:2603.05878v1 Announce Type: cross Pruning is widely recognized as an effective method for reducing the parameters of large language models (LLMs), potentially leading to efficient deployment and inference. One classic and prominent path of LLM one-shot pruning is to leverage second-order gradients (i.e., Hessian), represented by the pioneering work SparseGPT. However, the predefined left-to-right pruning order in SparseGPT leads to suboptimal performance when the weights exhibit columnar patterns. This paper studies the effect of pruning order under the SparseGPT framework.