Intel Arc Pro B70 32GB performance on Qwen3.5-27B@Q4

r/LocalLLaMA
Generative AI AI Hardware Open Source AI AI Tools

Posted something when I initially got the GPU on r/IntelArc. Did not have vllm working at the time, so no real use case numbers. After many nights fighting with vllm, I finally got it to work. Here are some summery. both llama.cpp and llm-scaler-vllm produce ~12tps token generation rate. tensor parallel degrade performance in all fronts (this may have something to do with my PCIe topology) pipeline parallel improves PP, but degrades TG at single query, improve both at high concurrency high concurrency performance is a lot better.