Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models

r/LocalLLaMA
Open Source AI

Qwen Team released Qwen-Scope - a collection of Sparse Autoencoders (SAEs) for the Qwen 3.5 family (from 2B to 35B MoE). They’ve mapped internal features for the residual stream across all layers. What is this exactly? Think of it as a dictionary of the model's internal concepts. Instead of looking at raw numbers, you can see specific "features" that represent concepts like "legal talk", "Python code", or "refusal". What can you do with this? Surgical Abliteration: You can find the exact feature ID for refusal/moralizing and suppress it.