AI RESEARCH
Multi-Headed Transformer Architectures as Time-dependent Wasserstein Gradient Flows
arXiv CS.LG
•
ArXi:2605.18870v1 Announce Type: new In recent years, transformer architectures have revolutionized the field of language processing, opening the door to previously unforeseen possibilities. However, from a theoretical point of view, the mathematical models proposed in the literature often lack direct contact with the actual architectures and depend on strong simplifying assumptions.