OpenClaw + oMLX shows 0 cached tokens, but Hermes uses cache fine with the same local model, what am I missing?
r/LocalLLaMA
•
Open Source AI
Hey everyone, I’m trying to debug a weird prompt cache issue with OpenClaw + oMLX, and I’d appreciate help from anyone running local agents on MLX/oMLX. The short version is this: I’m running oMLX v0.3.8 on my Mac, serving: Qwen3.6-35B-A3B-RotorQuant-MLX-4bit OpenClaw runs in Docker on my NAS and connects to oMLX through Tailscale / Docker extra host: Hermes WebUI / Hermes Agent also uses the same oMLX server and same model, and cache works fine there. So I don’t think this is simply “Qwen can’t cache” or “oMLX cache is broken.