AI RESEARCH

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry

arXiv CS.LG

ArXi:2604.22778v1 Announce Type: new We present the first systematic study of weight matrix singular value spectra \emph{during} transformer pre