Hypothesis-Driven Feature Manifold Analysis in LLMs via Supervised Multi-Dimensional Scaling

ArXi:2510.01025v2 Announce Type: replace The linear representation hypothesis states that language models (LMs) encode concepts as directions in their latent space, forming organized, multidimensional manifolds. Prior work has largely focused on identifying specific geometries for individual features, limiting its ability to generalize. We