What is your actual local LLM stack right now?
r/LocalLLaMA
•
Generative AI
AI Hardware
AI Research
I keep trying new models, but the bigger difference usually comes from the setup around them, not the model itself. Backend frontend RAG or no RAG quant choice GPU offload context settings prompt format whatever janky glue holds it together A lot of local setups look great in screenshots, then feel annoying in real use after two days. Right now I am interested in stacks that people actually stick with than benchmark wins. What are you running daily, and what part of your setup ended up mattering way than expected? submitted by /u/Ryannnnnnnnnnnnnnnh [link] [comments.