AI RESEARCH

Semantic Shift: the Fundamental Challenge in Text Embedding and Retrieval

arXiv CS.CL

ArXi:2603.21437v1 Announce Type: new Transformer-based embedding models rely on pooling to map variable-length text into a single vector, enabling efficient similarity search but also inducing well-known geometric pathologies such as anisotropy and length-induced embedding collapse. Existing accounts largely describe \emph{what} these pathologies look like, yet provide limited insight into \emph{when} and \emph{why} they harm downstream retrieval.