IMPACT: Importance-Aware Activation Space Reconstruction

ArXi:2507.03828v4 Announce Type: replace Large language models (LLMs) achieve strong performance across diverse domains but remain difficult to deploy in resource-constrained environments due to their size. Low-rank compression is a common remedy, typically minimizing weight reconstruction error under the assumption that weights are low-rank. However, this assumption often does not hold in LLMs. In contrast, LLM activations exhibit a pronounced low-rank structure, motivating approaches that minimize activation reconstruction error.