Squeeze even more performance on MLX

r/LocalLLaMA
Generative AI AI Hardware

AFM MLX has been optimized to squeeze even performance on MacOs than the Python version. It's a 100% native swift and 100% open source. To install: brew install scouzi1966/afm/afm or pip install macafm To see all features: afm mlx -h Batch mode. With concurrent connections, you can get a lot tokens generated usig multiple connections. This is suitable for multi-agent work with different contexts. AFM vs Python MLX It also has a --enable-prefix-cache flag to avoid wasting GPU resources recalulating the entire context in multiturn conversations with agents.