Ran some Llama.cpp RPC test to see if its worth it. And if 10Gbe needed.
r/LocalLLaMA
•
Generative AI
Open Source AI
Let me first say I am not doing anything with parallelism so these benchmarks and tests are not for you. That said if your hobbyist like me that is left wondering if can I use the GPUs my other PCs then I have some answers and but I'm still learning. There is probably a better config for Llama.cpp but haven't see any huge gains, in fact flash attention seems to slow things down a bit so I didn't test with on. Also I'm sure if someone has better than consumer level networking they could get their latency down which should improve things. I just don't have that kind of hardware.