AI RESEARCH
The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry
arXiv CS.LG
•
ArXi:2604.22778v1 Announce Type: new We present the first systematic study of weight matrix singular value spectra \emph{during} transformer pre