Capability-Guided Compression: Toward Interpretability-Aware Budget Allocation for Large Language Models

ArXi:2603.16440v1 Announce Type: new Large language model compression has made substantial progress through pruning, quantization, and low-rank decomposition, yet a fundamental limitation persists across all existing methods: compression budgets are allocated without any representation of what individual model components functionally encode.