AI RESEARCH
When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon
arXiv CS.AI
•
ArXi:2605.05699v1 Announce Type: cross KV-cache quantization is framed as a quality--latency trade-off.