How do i specify which gpu to use for kv cache? How to offload expert tensors to specific gpu?

r/LocalLLaMA
AI Hardware

I crossposted this from here, would love if anyone had an answer. I was looking how i could offload expert tensors to a specific gpu. And i am looking to find a way to do the same with the k cache. Reason being is that i have a weak and a strong gpu and i want only the non expert tensors on the strong gpu, while putting everything else on the weaker gpu. submitted by /u/milpster [link] [comments]