Can we use continuous batching to create agent swarm for local LLMs?

r/LocalLLaMA
Generative AI

Recently, I learned about the concept of continuous batching, where multiple users can interact with a single loaded LLM without significantly decreasing tokens per second. The primary limitation is the KV cache. I am wondering if it is possible to apply continuous batching to a single-user workflow. For example, if I ask an AI to analyze 10 different sources, it typically reads them sequentially within a 32k context window, which is slow.