Local LLM autocomplete + agentic coding on a single 16GB GPU + 64GB RAM
r/LocalLLaMA
•
Generative AI
AI Hardware
Today I set up a full coding toolbox on a single RTX 5080 (with RAM offloading) that's actually viable. Autocomplete: bartowski/Qwen2.5-Coder-7B-Instruct-GGUF:Q6_K_L Agentic: unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL Why these models: Qwen2.5 is still the best model for infill imo. I tried Gemma4 E4B and Qwen3.5 9B/4B and both produce weird suggestions. This autocomplete model takes ~8GB VRAM using the command below. The speed of suggestions is basically instant. Qwen3.6 35B-A3B is actually good at agentic coding at Q8 if you give it a good prompt.