Outlier Smoothing with Closed-Form Rotations for W4A4 Large Language Model Quantization

ArXi:2511.22316v2 Announce Type: replace Large Language Models (LLMs) quantization facilitates deploying LLMs in resource-limited settings, but existing methods that combine incompatible gradient optimization and quantization truncation lead to serious convergence pathology. This prolongs quantization time and degrades LLMs' task performance. Our studies confirm that Straight-Through Estimator (STE) on Stiefel manifolds