Strix Halo + eGPU RTX 5070 Ti via OCuLink in llama.cpp: Benchmarks and Conclusions (Part 2)
r/LocalLLaMA
•
Generative AI
Open Source AI
Hello everyone! Based on the community's feedback in previous post, I decided to write this post to clarify and expand on a few things. Many of you in the comments asked for benchmarks, so I'll start with benchmarks for current models. I benchmarked Qwen3.5-27B-UD-Q4_K_XL.gguf, distributing the layers (tensor split) between the APU and the eGPU in 10% increments: from 100%/0% to 0%/100%. Below, I'll show why, in reality, running these benchmarks wasn't strictly necessary.