Gradient Boosting within a Single Attention Layer

ArXi:2604.03190v1 Announce Type: cross Transformer attention computes a single softmax-weighted average over values -- a one-pass estimate that cannot correct its own errors. We