Dual DGX Sparks vs Mac Studio M3 Ultra 512GB: Running Qwen3.5 397B locally on both. Here's what I found.

I was spending about $2K/month on Claude API tokens for a personal AI assistant I run through Slack. After about 45 days of that cost pain I decided to go local. Bought both a dual DGX Spark setup and a Mac Studio M3 Ultra 512GB, each cost me about $10K after taxes. Same price, completely different machines. Here is what I learned running Qwen3.5 397B A17B on both. The Mac Studio MLX 6 bit quantization, 323GB model loaded into 512GB unified memory. 30 to 40 tok/s generation. The biggest selling point is memory bandwidth at roughly 800 GB/s.