New - Apple Neural Engine (ANE) backend for llama.cpp

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

This just showed up a couple of days ago on GitHub. Note that ANE is the NPU in all Apple Silicon, not the new 'Neural Accelerator' GPU cores that are only in M5. (ggml-org/llama.cpp) - Comment by arozano Built a working ggml ANE backend. Dispatches MUL_MAT to ANE via private API. M4 Pro results: 4.0 TFLOPS peak at N=256, 16.8x faster than CPU MIL-side transpose, kernel cache, quantized weight ANE for prefill (N>=64), Metal/CPU for decode Code: Based on maderix/ANE bridge. submitted by /u/PracticlySpeaking [link] [comments.