AI RESEARCH

TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly

arXiv CS.LG

ArXi:2603.19296v1 Announce Type: new To tackle the huge computational demand of large foundation models, activation-aware compression techniques without re