M5 Max Qwen 3 VS Qwen 3.5 Pre-fill Performance

Models: qwen3.5-9b-mlx 4bit qwen3VL-8b-mlx 4bit LM Studio From my previous post one guy mentioned to test it with the Qwen 3.5 because of a new arch. The results: The hybrid attention architecture is a game changer for long contexts, nearly 2x faster at 128K+. submitted by /u/M5_Maxxx [link] [comments]