TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization

ArXi:2605.19561v1 Announce Type: cross As Large Language Models (LLMs) advance toward practical deployment, the Microscaling FP4 (MXFP4) format has emerged as a cornerstone for next-generation low-bit inference, owing to its ability to balance high dynamic range with hardware efficiency. However, directly applying MXFP4 to LLM activation quantization inevitably leads to significant accuracy degradation.