I came from Data Engineering stuff before jumping into LLM stuff, i am surprised that many people in this space never heard Elastic/OpenSearch
r/LocalLLaMA
•
Generative AI
NLP
Data Science
Jokes aside, on a technical level, Google/brave search and vector s basically work in a very similar way. The main difference is scale. From an LLM point of view, both fall under RAG. You can even ignore embedding models entirely and just use TF-IDF or BM25. Elastic and OpenSearch (and technically Lucene) are powerhouses when it comes to this kind of retrieval. You can also enable a small BERT model as a vector embedding, around 100 MB (FP32), running in on CPU, within either Elastic or OpenSearch.