The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology

ArXi:2505.20435v3 Announce Type: replace-cross Existing interpretability methods for Large Language Models (LLMs) predominantly capture linear directions or isolated features. This overlooks the high-dimensional, relational, and nonlinear geometry of model representations. We apply persistent homology (PH) to characterize how adversarial inputs reshape the geometry and topology of internal representation spaces of LLMs. This phenomenon, especially when considered across operationally different attack modes, remains poorly understood.