Apple's MLX Runs Local LLMs 3x Faster Than llama.cpp — Until Your Context Hits 40K

Towards AI
Generative AI AI Hardware Open Source AI

Ollama just got 93% faster on every Apple Silicon Mac, and it did it without touching the model, the quantization, or the hardware.