Zero text between my agents – latent transfer now works cross-model
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
AI Tools
I posted about AVP here a few weeks ago - agents passing KV-cache to each other instead of text. Good discussion, a lot of questions about what benchmarks I actually used and how prefix caching fits in. Since then, I ran proper benchmarks on A100 (HumanEval, GSM8K, MATH, DebugBench, HotpotQA - n=164-500), got cross-model working, and made a Colab notebook so you can actually try it (free T4, ~8 min). Heads up - this only works with HuggingFace Transformers + GPU right now. No llama.cpp, no Ollama, no cloud APIs. It needs direct access to model internals. Quantized models untested.