In this paper, we describe a self-organizing neural network model that
addresses the process of early lexical acquisition in young children. The growing lexicon is modeled by combined semantic word representations based on distributional statistics of words and on grounded semantic features of words. Changing semantic word representations are assumed to model the maturation of word meaning and serve as inputs to the growing semantic map. The model has been tested on a real child-directed parental language corpus and as a result, the map demonstrates the emergence and reorganization of various word categories, as quantified by two measures.
Evidence from behavioral studies demonstrates that spoken language guides attention in a related visual scene and that attended scene information can influence the comprehension process. Here we model sentence comprehension within visual contexts. A recurrent neural network is trained to associate the linguistic input with the visual scene and to produce the interpretation of the described event which is part of the visual scene. A feedback mechanism is investigated, which enables explicit utterance-mediated attention shifts to the relevant part of the scene. We compare four models - a simple recurrent network (SRN) and three models with specific types of additional feedback - in order to explore the role of the attention mechanism in the comprehension process. The results show that all networks learn not only successfully to produce the interpretation at the sentence end, but also demonstrate predictive behavior reflected by the ability to anticipate upcoming constituents. The SRN performs expectedly very well, but demonstrates that adding an explicit attentional mechanism does not lead to loss of performance, and even results in a slight improvement in one of the models.