AI RESEARCH

Enhancing SignSGD: Small-Batch Convergence Analysis and a Hybrid Switching Strategy

arXiv CS.LG

ArXi:2604.25550v1 Announce Type: new SignSGD compresses each stochastic gradient coordinate to a single bit, offering substantial memory and communication savings, but its 1-bit quantization removes magnitude information and is known to leave a generalization gap relative to well-tuned SGD. We revisit SignSGD from a 1-bit quantization and dithering perspective and contribute three improvements.