Zero text between my agents – latent transfer now works cross-model

r/LocalLLaMA
Generative AI AI Hardware Open Source AI AI Tools

I posted about AVP here a few weeks ago - agents passing KV-cache to each other instead of text. Good discussion, a lot of questions about what benchmarks I actually used and how prefix caching fits in. Since then, I ran proper benchmarks on A100 (HumanEval, GSM8K, MATH, DebugBench, HotpotQA - n=164-500), got cross-model working, and made a Colab notebook so you can actually try it (free T4, ~8 min). Heads up - this only works with HuggingFace Transformers + GPU right now. No llama.cpp, no Ollama, no cloud APIs. It needs direct access to model internals. Quantized models untested.