Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs

ArXi:2601.22795v2 Announce Type: replace Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs. Several studies on LLM efficiency optimization argue that it is possible to prune a significant portion of the parameters, while only marginally impacting performance. This suggests that the computation is not uniformly distributed across the parameters. We