AI RESEARCH
Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize
arXiv CS.AI
•
ArXi:2605.04396v1 Announce Type: cross Recent work has shown that Transformers' compositional generalization is governed by \emph{complexity control}, initialization scale and weight decay, which steers