running Qwen3.5-27B Q5 splitt across a 4070ti and an amd rx6800 over LAN @ 13t/s with a 32k prompt

r/LocalLLaMA
AI Hardware

I don't know why I haven't seen the rpc-server thing before. But what a gamechanger! I been using smaller models for a while now, because i'm gpu poor. 27b dense has been out of the question at any kind of reasonable speed. I love the qwen3.5 family. I love everyone who has ever contributed to llamacpp. I love unsloth. And everyone else!:D My setup is a 12gb 4070 ti, i7-14700k with 64gb ddr4-3600 in 1 computer, and the 16gb vram amd rx6800, i5-11600k and 48gb ddr4-3200 in the other.