I tracked a major cache reuse issue down to Qwen 3.5’s chat template

Over the last week, I’ve been investigating cache misses while optimizing local agent workflows on my M5 Max. My setup used oMLX.ai as a backend with agents like OpenCode.ai and Pi.de, but I reproduced the same behavior with other backends like llama.cpp too. At first, I assumed this was an inference engine issue or a cache implementation bug.