Apple's MLX Runs Local LLMs 3x Faster Than llama.cpp — Until Your Context Hits 40K
Towards AI
•
Generative AI
AI Hardware
Open Source AI
Ollama just got 93% faster on every Apple Silicon Mac, and it did it without touching the model, the quantization, or the hardware.