AI RESEARCH
TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly
arXiv CS.LG
•
ArXi:2603.19296v1 Announce Type: new To tackle the huge computational demand of large foundation models, activation-aware compression techniques without re