AI RESEARCH

When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon

arXiv CS.AI

ArXi:2605.05699v1 Announce Type: cross KV-cache quantization is framed as a quality--latency trade-off.