AI RESEARCH

From Topic to Transition Structure: Unsupervised Concept Discovery at Corpus Scale via Predictive Associative Memory

arXiv CS.AI

ArXi:2603.18420v1 Announce Type: new Embedding models group text by semantic content, what text is about. We show that temporal co-occurrence within texts discovers a different kind of structure: recurrent transition-structure concepts or what text does. We train a 29.4M-parameter contrastive model on 373M co-occurrence pairs from 9,766 Project Gutenberg texts (24.96M passages), mapping pre-trained embeddings into an association space where passages with similar transition structure cluster together.