Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation

ArXi:2605.03058v1 Announce Type: new A key goal of explainable AI (XAI) is to express the decision logic of large language models (LLMs) in symbolic form and link it to internal mechanisms. Global rule-extraction methods typically learn symbolic surrogates without grounding rules in model circuitry, while mechanistic interpretability can connect behaviors to neuron sets but often depends on hand-crafted hypotheses and expensive neuron-level interventions. We