Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models

ArXi:2512.21651v2 Announce Type: replace Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computational and memory burden, various compression techniques have been proposed, including quantization, pruning, and knowledge distillation. Among these, post-