Computational modeling of early language learning from acoustic speech and audiovisual input without linguistic priors

ArXi:2603.08359v1 Announce Type: new Learning to understand speech appears almost effortless for typically developing infants, yet from an information-processing perspective, acquiring a language from acoustic speech is an enormous challenge. This chapter reviews recent developments in using computational models to understand early language acquisition from speech and audiovisual input. The focus is on self-supervised and visually grounded models of perceptual learning.