GitHub - intel/auto-round: A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

r/LocalLLaMA
Generative AI AI Hardware AI Research AI Tools

Submitted by /u/muyuu [link] [comments]