The Missing Memory Hierarchy: Demand Paging for LLM Context Windows

ArXi:2603.09023v1 Announce Type: cross The context window of a large language model is not memory. It is L1 cache: a small, fast, expensive resource that the field treats as the entire memory system. There is no L2, no virtual memory, no paging. Every tool definition, every system prompt, and every stale tool result occupies context for the lifetime of the session. The result is measurable: across 857 production sessions and 4.45M effective input tokens, 21.8% is structural waste. We present Pichay, a demand paging system for LLM context windows.