E-TCAV: Formalizing Penultimate Proxies for Efficient Concept Based Interpretability

ArXi:2605.10261v1 Announce Type: new TCAV (Testing with Concept Activation Vectors) is an interpretability method that assesses the alignment between the internal representations of a trained neural network and human-understandable, high-level concepts. Though effective, TCAV suffers from significant computational overhead, inter-layer disagreement of TCAV scores, and statistical instability. This work takes a step toward addressing these challenges by