SigGate-GT: Taming Over-Smoothing in Graph Transformers via Sigmoid-Gated Attention

ArXi:2604.17324v1 Announce Type: new Graph transformers achieve strong results on molecular and long-range reasoning tasks, yet remain hampered by over-smoothing (the progressive collapse of node representations with depth) and attention entropy degeneration. We observe that these pathologies share a root cause with attention sinks in large language models: softmax attention's sum-to-one constraint forces every node to attend somewhere, even when no informative signal exists.