AI RESEARCH
Disentangling MLP Neuron Weights in Vocabulary Space
arXiv CS.CL
•
ArXi:2604.06005v1 Announce Type: new Interpreting the information encoded in model weights remains a fundamental challenge in mechanistic interpretability. In this work, we