Qwen 3.5 28B A3B REAP for coding initial impressions
r/LocalLLaMA
•
Open Source AI
This is a follow up for I'd guess given the comments I've reviewed Qwen 3.5 (and Gemma 4) are deemed among the best models published for public consumption. the original models in hf are here: unsloth contributed various quants among the models I tried are, on my plain old haswell i7 cpu 32 gb dram, all Q4_K_M quants unsloth/Qwen3.5-27B-GGUF 0.95 tokens / s unsloth/Qwen3.5-35B-A3B-GGUF 4 tokens / s barozp/Qwen-3.5-28B-A3B-REAP-GGUF 7.5 tokens / s tokens / s degrades as context becomes larger e.g. when following up with prompts in the same context / thread.