AI RESEARCH

Stochastic Parroting in Temporal Attention -- Regulating the Diagonal Sink

arXiv CS.LG

ArXi:2602.10956v3 Announce Type: replace Spatio-temporal models analyze spatial structures and temporal dynamics, which makes them prone to information degeneration among space and time. Prior literature has nstrated that over-squashing in causal attention or temporal convolutions creates a bias on the first tokens. To analyze whether such a bias is present in temporal attention mechanisms, we derive sensitivity bounds on the expected value of the Jacobian of a temporal attention layer.