BitSkip: An Empirical Analysis of Quantization and Early Exit Composition in Transformers

ArXi:2510.23766v2 Announce Type: replace The pursuit of efficient Large Language Models (LLMs) has led to increasingly complex techniques like extreme quantization and dynamic routing. While individual benefits of these methods are well-documented, their compositional effects remain poorly understood. This paper