How many of you tried BeeLlama.cpp? How's it? Agentic coding possible with 8GB VRAM?
r/LocalLLaMA
•
Generative AI
Open Source AI
We'll be getting those features(check bottom link) on mainline soon or later anyway. But for now this fork could be useful to see the full potential of our poor GPUs(and also big, large GPUs). Any 8GB VRAM(and 32GB RAM) folks already doing Agentic coding with models(@ Q4 at least) like Qwen3.6-35B-A3B, Qwen3.6-27B, Gemma-4-31B, Gemma-4-26B-A4B? I would love to see some t/s stats, full commands & details on that. I'm not expecting any miracle with 8GB VRAM, still want to do something decent with limited constraints.