GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models

ArXi:2603.13418v1 Announce Type: cross Structured pruning is widely used to compress large language models (LLMs), yet its effectiveness depends heavily on neuron importance estimation. Most existing methods estimate neuron importance from activation statistics on a single calibration dataset, which