CLI RAG tool that works with Ollama.
r/LocalLLaMA
•
Generative AI
NLP
AI Research
It's a Python CLI that picks up your Ollama instance automatically. The retrieval pipeline is where I spent most of the time though. It's not just vector search. Hybrid retrieval combines BM25 with semantic vectors, then a cross-encoder reranker (ms-marco-MiniLM-L-6-v2) scores each passage independently. Query decomposition splits multi-part questions into separate searches. Coreference resolution handles follow-ups using conversation history. All heuristic, no extra LLM calls.