Mapping GPUs to LLMs (and back): A bandwidth-based estimator for local inference

r/LocalLLaMA
Generative AI

Submitted by /u/alexp_lt [link] [comments]