AI RESEARCH
Optimal Attention Temperature Improves the Robustness of In-Context Learning under Distribution Shift in High Dimensions
arXiv CS.LG
•
ArXi:2511.01292v2 Announce Type: replace-cross Pretrained Transformers can perform in-context learning (ICL) from a few nstrations, but this ability can fail sharply when the test distribution differs from pre