AI RESEARCH
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
arXiv CS.LG
•
ArXi:2603.22801v1 Announce Type: new Transformers have achieved great success across a wide range of applications, yet the theoretical foundations underlying their success remain largely unexplored. To demystify the strong capacities of transformers applied to versatile scenarios and tasks, we theoretically investigate utilizing transformers as students to learn from a class of teacher models.