Large-scale evidence for logarithmic effects of word predictability on reading time

Published in Proceedings of the National Academy of Sciences, 2024

During real-time language comprehension, our minds rapidly decode complex meanings from sequences of words. The difficulty of doing so is known to be related to words’ contextual predictability, but what cognitive processes do these predictability effects reflect? In one view, predictability effects reflect facilitation due to anticipatory processing of words that are predictable from context. This view predicts a linear effect of predictability on processing demand. In another view, predictability effects reflect the costs of probabilistic inference over sentence interpretations. This view predicts either a logarithmic or a superlogarithmic effect of predictability on processing demand, depending on whether it assumes pressures toward a uniform distribution of information over time. The empirical record is currently mixed. Here, we revisit this question at scale: We analyze six reading datasets, estimate next-word probabilities with diverse statistical language models, and model reading times using recent advances in nonlinear regression. Results support a logarithmic effect of word predictability on processing difficulty, which favors probabilistic inference as a key component of human language processing.

@article{shain2024largescale,
    author = {
        Cory Shain and
        Clara Meister and
        Tiago Pimentel and
        Ryan Cotterell and
        Roger Levy
    },
    article = {Proceedings of the National Academy of Sciences},
    title = {Large-scale evidence for logarithmic effects of word predictability on reading time},
    year = {2024},
    volume = {121},
    number = {10},
    doi = {10.1073/pnas.2307876121},
    url = {https://www.pnas.org/doi/abs/10.1073/pnas.2307876121},
    pages = {e2307876121},
}