GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

ArXi:2605.18475v1 Announce Type: new Mixed-precision quantization improves the budget--accuracy trade-off for large language models (LLMs) by allocating bits to sensitive modules. However, automating this allocation at LLM scale faces a unique combination of constraints: learnable approaches require quantization-aware