Abliterating Qwen3.5-397B on a Mac Studio revealed that MoE models encode refusal differently than dense models — safety refusals route through expert selection and survive weight-baking

r/LocalLLaMA
Generative AI AI Tools

Part of a series documenting building a fully local AI assistant on DGX Sparks + Mac Studio. I adapted FailSpy's abliteration technique for Qwen3.5-397B-A17B at 4-bit on a Mac Studio M3 Ultra (512GB). The goal was removing PRC censorship (Tiananmen, Taiwan, Uyghurs, Winnie the Pooh) from my personal assistant. Three findings I haven't seen documented anywhere: MoE models have two separable refusal subspaces. Chinese-political and Western-safety refusals are different directions in activation space. You can surgically remove one without touching the other.