A framework for analyzing concept representations in neural models

ArXi:2605.01381v1 Announce Type: cross Understanding how neural models represent human-interpretable concepts is challenging. Prior work has explored linear concept subspaces from diverse perspectives, such as probing and concept erasure. We