SPARC: Concept-Aligned Sparse Autoencoders for Cross-Model and Cross-Modal Interpretability

ArXi:2507.06265v2 Announce Type: replace-cross Understanding how different AI models encode the same high-level concepts, such as objects or attributes, remains challenging because each model typically produces its own isolated representation. Existing interpretability methods like Sparse Autoencoders (SAEs) produce latent concepts individually for each model, resulting in incompatible concept spaces and limiting cross-model interpretability. To address this, we