Local Qwen3-0.6B INT8 as embedding backbone for an AI memory system

Most AI coding assistants solve the memory problem by calling an embedding API on every and retrieve. This does not scale. 15-25 sessions per day means hundreds of API calls, latency on every write, and a dependency on a service that can change pricing at any time. I needed embeddings for a memory lifecycle system that runs inside Claude Code. The system processes knowledge through 5 phases: buffer, connect, consolidate, route, age. Embeddings drive phases 2 through 4 (connection tracking, cluster detection, similarity routing.