Very happy with Qwen 3.5 122B output. But is slowness expected?
r/LocalLLaMA
•
Generative AI
Open Source AI
I'm running the 122-billion Qwen 3.5, specifically Qwen3.5-122B-A10B-Q5_K_M, on DGX Spark (128 GB contiguous memory). I'm (very!) impressed with the general knowledge output. I can talk to it in multiple languages, and don't feel the need to consult online frontier models for any encyclopaedic, general "handyman" or other day-to-day questions. My local Qwen seems sufficient. This said, the output seems slow, around 19 tokens/s. Is this speed expected? I'm running the model from llama-server (latest compile as of yesterday), and the chat UI is Open WebUI.