I built an open-source memory layer for LLMs — here's how it works

LLMs are stateless by design. You send a message, you get a reply, and the model instantly forgets everything. Every conversation starts cold. That's fine for one-off tasks. It's a real problem when you're building anything personal - a coding assistant that knows your stack, a writing tool that remembers your style, an agent that tracks what you've decided across sessions. The usual answers are: roll your own RAG pipeline, use a cloud memory service, or spend a weekend stitching together embeddings, a vector database, and prompt injection logic. None of those feel like the answer.