I no longer need a cloud LLM to do quick web research
r/LocalLLaMA
•
Generative AI
Open Source AI
This might be super old news to some people, but I only just recently started using local models due to them only just now meeting my standards for quality. I just want to share the setup I have for web searching/scraping locally. I use Qwen3.5:27B-Q3_K_M on an RTX 4090 with a context length of ~200,000. I get ~40 tk/s and use about 22gb VRAM. I use it through the llama.cpp Web UI, with MCP tools enabled.