N-gram-like Language Models Predict Reading Time Best

ArXi:2603.09872v1 Announce Type: new Recent work has found that contemporary language models such as transformers can become so good at next-word prediction that the probabilities they calculate become worse for predicting reading time. In this paper, we propose that this can be explained by reading time being sensitive to simple n-gram statistics rather than the complex statistics learned by state-of-the-art transformer language models.