Transformers Can Implement Preconditioned Richardson Iteration for In-Context Gaussian Kernel Regression

ArXi:2605.08475v1 Announce Type: cross Mechanistic accounts of in-context learning (ICL) have identified iterative algorithms for linear regression and related linear prediction tasks, often using linear or ReLU attention variants. For nonlinear ICL, prior work has related softmax and kernelized attention to functional-gradient-type dynamics, but it remains unclear whether a standard transformer with softmax attention can implement a convergent solver with an end-to-end prediction-error guarantee.