AI RESEARCH

Disentangling MLP Neuron Weights in Vocabulary Space

arXiv CS.CL

ArXi:2604.06005v1 Announce Type: new Interpreting the information encoded in model weights remains a fundamental challenge in mechanistic interpretability. In this work, we