A new kind of silence

Dev.to AI
Generative AI

There's a new kind of silence in my house - an agent that could speak but doesn't. It waits. For two years I worked next to these models, and I kept misreading the pauses. Latency, I called it. A bug. A network roundtrip. The cost of context. I optimized it down, batched the prompts, prefetched the embeddings. I treated the gap between my question and its answer as something to remove. But the gap is the thing. When I send a message now, I notice the moment before the cursor blinks. The model is thinking, the metaphor goes, though we both know it isn't. Tokens are being sampled.