Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale

ArXi:2604.18572v1 Announce Type: cross The Platonic Representation Hypothesis suggests that neural networks trained on different modalities (e.g., text and images) align and eventually converge toward the same representation of reality. If true, this has significant implications for whether modality choice matters at all. We show that the experimental evidence for this hypothesis is fragile and depends critically on the evaluation regime.