Question: Prompt format for memory injection (local offline AI assistant, 6GB VRAM)?
r/LocalLLaMA
•
Generative AI
AI Tools
Hi there! My question(-s) are at the bottom, but let me tell you what I am trying to do and how, first: For my work-in-progress offline AI assistant I implemented a very simple memory system that s statements ("memories") extracted from earlier chats in an Sqlite database. In a later chat, each time after the user enters a prompt, the system extracts the most relevant of these "memories" via embedding vector cosine similarity comparance and reranking (I am using snowflake-arctic-embed-s Q8_0 for embeddings and bge-reranker-v2-m3 Q5_k_m for reranking right now.