LLM386: borrowing a 1990s idea for managing LLM context

Dev.to AI
Generative AI

In 1989, DOS had a 640 KB ceiling on conventional memory. EMM386 used the 80386 CPU's address-translation hardware to page chunks of a much larger memory space through a small fixed window inside that 640 KB. Programs that asked nicely got effectively unlimited memory through a peephole, by paging only what was relevant for the current operation. LLMs have the same problem. The context window is bounded; 32K, 128K, 1M tokens. Your data is bigger. Conversation history, retrieved documents, tool results, persistent facts will exceed any window worth paying for.