SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models

ArXi:2604.10091v1 Announce Type: new Large language models (LLMs) have shown remarkable performance in various domains, but they are constrained by massive computational and storage costs. Quantization, an effective technique for compressing models to fit resource-limited devices while preserving generative quality, encompasses two primary methods: quantization aware