AI RESEARCH

TileQ: Efficient Low-Rank Quantization of Mixture-of-Experts with 2D Tiling

arXiv CS.LG

ArXi:2605.09281v1 Announce Type: new Mixture-of-Experts (MoE) models achieve remarkable performance by sparsely activating specialized experts, yet their massive parameters in experts pose significant challenges for deployment. While low-rank quantization offers a promising route to compress MoE models, existing methods still incur nonnegligible memory overhead and inference latency. To address these limitations, we propose \textsc{TileQ}, a fine-tuning-free post-