Llama.cpp auto-tuning optimization script
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Research
I created a auto-tuning script for llama.cpp,ik_llama.cpp that gets you the max tokens per seconds on weird setups like mine 3090ti + 4070 + 3060. No Flag configuration, OOM crashing yay submitted by /u/raketenkater [link] [comments]