We can use continuous batching for agent swarm to drastically reduce the time for research or coding.

r/LocalLLaMA
AI Hardware Open Source AI

We can use continuous batching for an agent swarm to actually kill research time. found performance for qwen 27b on that intel b70 32gb card. if you just chat one on one, you get: avg prompt throughput: 85.4 tokens/s avg generation throughput: 13.4 tokens/s doing 50 tasks (51200 input tokens, 25600 generated) takes 42 minutes of your life. the move is an agent swarm. 1 orchestrator and 49 agents all working at once makes the gpu swallow every prompt in the same batch. total power hits 1100 tokens a second.