APreQEL: Adaptive Mixed Precision Quantization For Edge LLMs

ArXi:2603.23575v1 Announce Type: new Today, large language models have nstrated their strengths in various tasks ranging from reasoning, code generation, and complex problem solving. However, this advancement comes with a high computational cost and memory requirements, making it challenging to deploy these models on edge devices to ensure real-time responses and data privacy. Quantization is one common approach to reducing memory use, but most methods apply it uniformly across all layers. This does not account for the fact that different layers may respond differently to reduced precision.