AI RESEARCH

Optimal Attention Temperature Improves the Robustness of In-Context Learning under Distribution Shift in High Dimensions

arXiv CS.LG • May 12, 2026

ArXi:2511.01292v2 Announce Type: replace-cross Pretrained Transformers can perform in-context learning (ICL) from a few nstrations, but this ability can fail sharply when the test distribution differs from pre

Read Full Article