MTP+GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 - llama.cpp
r/LocalLLaMA
•
Generative AI
Open Source AI
I was wondering what will be the difference in results with flag: GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 vs MTP+GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 Results are quite interesting 49tok/sec without MTP vs 64 tok/sec with