AI RESEARCH
Subcritical Signal Propagation at Initialization in Normalization-Free Transformers
arXiv CS.LG
•
ArXi:2604.11890v1 Announce Type: new We study signal propagation at initialization in transformers through the averaged partial Jacobian norm (APJN), a measure of gradient amplification across layers. We extend APJN analysis to transformers with bidirectional attention and permutation-symmetric input token configurations by deriving recurrence relations for activation statistics and APJNs across layers. Our theory predicts how attention modifies the asymptotic behavior of the APJN at large depth and matches APJNs measured in deep vision transformers.