MiniMax 4bit (120gb) MLX - 26.5% (MMLU 200q) while JANG_2S (60gb) gets 74% - GGUF for MLX
r/LocalLLaMA
•
Generative AI
Open Source AI
People trade the M chip speed for coherency, with no GGUF equivalent on MLX (qwen 3.5 on macs when using gguf is also 1/3rd slower than MLX) so I decided to make it after hearing how Qwen 3.5 at 397b at q2 on gguf actually performs fine and wanted to be able to run a model of that size with MLX speeds without it being completely unusable. Recently I came across this thread and it included talk about how bad the 4bit MLX is. """ MiniMax-M2.5 can't code - 10% on HumanEval+ despite 87% tool calling and 80% reasoning. Something is off with its code generation format. Great for reasoning though.