backend-agnostic tensor parallelism has been merged into llama.cpp

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

If you have than one GPU - your models can now run much faster -sm layer is the default behaviour, -sm tensor is the new thing to try "backend-agnostic" means you don't need CUDA to enjoy this This is experimental, and in your case the results may be poor (try different models). You have been warned! submitted by /u/jacek2023 [link] [comments]