AI RESEARCH

Stochastic Attention via Langevin Dynamics on the Modern Hopfield Energy

arXiv CS.LG

ArXi:2603.06875v1 Announce Type: new Attention heads retrieve: given a query, they return a softmax-weighted average of d values. We show that this computation is one step of gradient descent on a classical energy function, and that Langevin sampling from the corresponding distribution yields \emph{stochastic attention}: a