Intel Arc Pro B70 32GB performance on Qwen3.5-27B@Q4
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
AI Tools
Posted something when I initially got the GPU on r/IntelArc. Did not have vllm working at the time, so no real use case numbers. After many nights fighting with vllm, I finally got it to work. Here are some summery. both llama.cpp and llm-scaler-vllm produce ~12tps token generation rate. tensor parallel degrade performance in all fronts (this may have something to do with my PCIe topology) pipeline parallel improves PP, but degrades TG at single query, improve both at high concurrency high concurrency performance is a lot better.