AI RESEARCH
Working Memory Constraints Scaffold Learning in Transformers under Data Scarcity
arXiv CS.LG
•
ArXi:2604.20789v1 Announce Type: cross We investigate the integration of human-like working memory constraints into the Transformer architecture and implement several cognitively inspired attention variants, including fixed-width windows based and temporal decay based attention mechanisms. Our modified GPT-2 models are trained from scratch on developmentally plausible datasets (10M and 100M words). Performance is evaluated on grammatical judgment tasks (BLiMP) and alignment with human reading time data.