AI RESEARCH
On the Convergence of Gradient Descent on Learning Transformers with Residual Connections
arXiv CS.LG
•
ArXi:2506.05249v4 Announce Type: replace Transformer models have emerged as fundamental tools across various scientific and engineering disciplines, owing to their outstanding performance in diverse applications. Despite this empirical success, the theoretical foundations of Transformers remain relatively underdeveloped, particularly in understanding their