GitHub - intel/auto-round: A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
r/LocalLLaMA
•
Generative AI
AI Hardware
AI Research
AI Tools
Submitted by /u/muyuu [link] [comments]