Demystifying When Pruning Works via Representation Hierarchies

ArXi:2603.24652v1 Announce Type: cross Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings.