On the Geometric Structure of Layer Updates in Deep Language Models

ArXi:2604.02459v1 Announce Type: cross We study the geometric structure of layer updates in deep language models. Rather than analyzing what information is encoded in intermediate representations, we ask how representations change from one layer to the next. We show that layerwise updates admit a decomposition into a dominant tokenwise component and a residual that is not captured by restricted tokenwise function classes.