Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution

ArXi:2502.06809v3 Announce Type: replace-cross Pervasive polysemanticity in large language models (LLMs) undermines discrete neuron-concept attribution, posing a significant challenge for model interpretation and control. We systematically analyze both encoder and decoder based LLMs across diverse datasets, and observe that even highly salient neurons for specific semantic concepts consistently exhibit polysemantic behavior.