Little is known about the brain mechanisms involved in word learning during infancy and in second language acquisition and about the way these new words become stable representations that sustain language processing. In several studies we have adopted the human simulation perspective, studying the effects of brain-lesions and combining different neuroimaging techniques such as event-related potentials and functional magnetic resonance imaging in order to examine the language learning (LL) process. In the present article, we review this evidence focusing on how different brain signatures relate to (i) the extraction of words from speech, (ii) the discovery of their embedded grammatical structure, and (iii) how meaning derived from verbal contexts can inform us about the cognitive mechanisms underlying the learning process. We compile these findings and frame them into an integrative neurophysiological model that tries to delineate the major neural networks that might be involved in the initial stages of LL. Finally, we propose that LL simulations can help us to understand natural language processing and how the recovery from language disorders in infants and adults can be accomplished.
Suppose an adult and child human arrive on Mars and discover that there are Martians who seem to speak a language to one another. If the adult and child human stay on Mars for several years and try to learn this language, what do you think will be the outcome? (Gleitman & Newport 1997, p. 22)
The quoted problem introduced by Gleitman and Newport illustrates two crucial aspects of language learning (LL). First, although infants learn their first language as part of their cognitive development, adults are also faced with this challenge when learning a second language. Second and most important, the differences and commonalities existing between the acquisition process in infants and adults might shed some light on the learning machinery necessary for mastering a new language.
Despite this fact, infant and adult language acquisition and learning processes1 have rarely been compared. It is clear that important factors that differ between the two populations may impact the way learning takes place. In that sense, infants are making sense of a whole world while developing other cognitive functions in parallel with language (Diamond 2002). Moreover, this development in infants is accompanied by different rates of brain maturation and myelinization in different regions, which constrains cognitive functions (Casey et al. 2000; Uylings 2006). These maturation factors add to other aspects such as implicit learning (non-instructed) of first language acquisition, as compared to the frequently explicit training in adult LL. Nevertheless, aside from these factors, some core aspects of language acquisition might be shared between the two populations. Thus, we believe that the cross-talk between the cognitive neuroscience of LL in adults and infants is necessary and can be very fruitful because the two fields can bring valuable information to each other as will be outlined here. With that aim in mind, complementary neurobiological and developmental perspectives are required. Non-invasive acquisition of brain activity (e.g. using event-related brain potentials (ERPs) and structural and functional Magnetic Resonance Imaging (fMRI)) and the study of the developmental changes that occur in the course of word learning (at the structural and functional brain levels) will drastically aid in filling in the missing pieces of information about LL mechanisms in infants and adults (Spelke 2002).
The present paper is divided into four parts. First, we introduce the relevant factors that have been highlighted in the two fields in an attempt to show how they are interrelated. Second, we review the different studies of adults and infants devoted to early segmentation and word recognition processes along with the mechanisms involved in the extraction of rules embedded in those words. In the third section, we present the problem of inferring the meaning of a new word from a verbal context. In the fourth section, we present an integrative proposal of the main neurophysiological mechanisms involved in LL and the interface of LL with different cognitive processes.
2. Relevant factors in first and second language acquisition
(a) First language acquisition
The last 20 years have witnessed an enormous amount of research on LL and cognitive development, mostly coming from behavioural infant studies (Saffran et al. 2006; Kuhl & Rivera-Gaxiola 2008). Although our knowledge of how infants are able to master a new language is rapidly increasing, the problem is so complex and multifaceted that the current state of the art is still far from providing a clear picture of the exact learning mechanisms involved in this rich period of our life (see Gaskell & Ellis, this issue). When considering this ‘complex problem’, we need to consider at least three important aspects: (i) the input of the learning process, (ii) the cognitive mechanisms involved in learning, and (iii) their developmental time course.
When considering the input level, several authors underscore the distinction between learning words (lexical knowledge) and rules or constraints (grammatical knowledge). The capacity to develop grammatical abilities appears even in circumstances of impoverished input or impaired intellectual abilities (e.g. in Williams syndrome). However, Martens et al. (2008), in a recent review, contradicted this idea and suggested that individuals with Williams syndrome presented in some cases both typical (but delayed) and atypical grammatical abilities. In particular, the atypical abilities were encountered in tasks measuring complex skills, such as morphosyntactic and semantic integration. In any case, the general agreement between the independency of grammar evolution with respect to other cognitive abilities have led generative linguists to postulate the existence of a very powerful innate LL device for accomplishing this highly demanding task (Chomsky 2002). Thus, from the same input, the cognitive system is able to extract two different types of information for language development, namely words and rules. From the generative point of view, this is accomplished using a predefined innate LL device whereas input-based traditions sustain that learning is accomplished by exploiting the internal characteristics of the input (Ellis 2008).
Thus, although this generative grammar account has received several criticisms, some core ideas in this view remain important at the level of cognitive processes, in the sense that, whatever the learning mechanisms we are genetically endowed with, we are able to accomplish this complex enterprise when faced with the appropriate triggering inputs. The specification at the biological level of the genetic regulatory, expressive and epigenetic mechanisms that allow humans to accomplish this task might be one of the main fields of research in the future.2 An unresolved issue is to what extent infants are able to learn a language because they are equipped with a very powerful general-purpose learning mechanism or because they are equipped with a language-specific acquisition device (Elman et al. 1996; Bates et al. 1998; Hauser & Bever 2008).
Some researchers have already provided alternative hypotheses to the existence of specific LL mechanisms (Golinkoff et al. 2000). For example, it has been proposed that general principles of associative learning can explain many of the LL characteristics (Ellis 2008). In the same vein, a general purpose statistical learning mechanism has been proposed to underlie a great deal of the LL phenomena (Smith 2000). Other authors have considered a broader view of the infant word-learning process, which might require the interaction of more general capacities, including conceptual and theory of mind capacities and grammatical knowledge (Bloom 2000). Similarly, an influential current view emphasizes the unavoidable fact that infants are social agents and that LL might be considered fundamentally a social activity (Tomasello 2003). This theoretical position downplays the validity of the Quinean dilemma (Quine 1960), which points out the inherent ambiguity of the meaning conveyed between two persons in a learning context (multiple meanings could always be mapped onto a new word). Because a social interaction could disambiguate the referent intentions of the speaker, infants do not require the specific internal language constraints that have been postulated to learn the meaning of a new word (Markman 1989).
In fact, the existence and availability of multiple perceptual, linguistic and social cues in the learning environment of infants lead some researchers to consider that a simple blind association mechanism might not be enough and attention is required for focusing, selecting and integrating the incoming speech and the multiple cues provided by the environment. Several convergent ideas on the importance of attention to social cues (eye gaze, pointing behaviour and joint attention) highlight the importance of considering social interaction as a key aspect when learning a language (Baldwin 1991). Notice that these new social-learning perspectives emphasize the richness of the initial infant-learning environment and the necessity that powerful cognitive capacities aid the processing and integration of this information. For example, some important concepts in this social pragmatic perspective, like social understanding or theory of mind, might require the involvement of high-level cognitive processes.
In a similar vein, Gentner & Namy (2004) have proposed another domain-general mechanism, the comparison process, involved in early word learning. This mechanism, based on research in analogy and similarity, allows storage and comparison of multiple similar experiences highlighting their commonalities and inherent abstract relations (even when exposures are separated in time). Thus, this process might be useful in infants to infer and discover the meaning of newly encountered words. The observed commonalities between two different experiences (e.g. imagine a child or an adult hearing the same label applied to different entities, as for example would be the case for the word ‘animal’) might trigger the comparison process, bypassing surface commonalities (the label ‘animal’) and extracting deeper abstract or conceptual relations (the concept of animal applies to different entities). Analogical reasoning might underlie these comparison processes. To what degree this comparison process would be responsible for the acquisition of grammatical rules is still an open question (Gomez & Maye 2005).
Finally, at the development level, it is very important to observe the changes in the language acquisition mechanisms across the lifespan. One of the tenets of the emergentist coalition model (ECM) (Hollich et al. 2000) is that infants might use a coalition of available learning cues in the environment (perceptual, social and linguistic) in order to learn a language. The interesting aspect proposed by this account is that the weighting of these cues and their involvement in the word-learning journey might change over time (developmental perspective). For example, infants at the early stages of learning might depend more on perceptual and salient attributes in the environment and less on linguistic properties. With maturation and after the completion of the first word-learning milestone, infant's attention might turn more to linguistic aspects in order to boost their learning resources. Similarly, some developmental theories of lexical recognition and literacy (Fowler 1991; Walley 1993) have proposed that phoneme representations undergo fairly gradual substantial changes during childhood language development. This position is interesting, because it emphasizes how the representations that sustain phonological and lexical information interact between them and are transformed during language development. During this process, the initial phonological representation might consist of an implicit perceptual unit used for basic speech representation, and afterwards, it is transformed into an explicit cognitive representation that can be used for reading tasks. More specifically, the initial lexical representations in early childhood might be holistic (most probably based on larger units than the phonemic segments, such as syllables or overall acoustic shape) and as the lexicon is accrued by the child, these representations might become more refined and based on phonological segments in this model (Walley 1993; Metsala & Walley 1998). This transformation might be partially driven by infant vocabulary growth, and therefore, by the infants' increasing need to distinguish target words in the lexicon in a faster way (Metsala & Walley 1998).
(b) Second language acquisition
Although an important subset of language studies have focused their investigation on infants' language development, it is also true that most adults learn more than one language in their lifetime. Besides the emotional circumstances that will inevitably affect this learning process (Klein 1996), the crucial question is whether the large infants' learning plasticity reflected while learning their native language during the first 2 years of life is maintained at all throughout the lifespan. The idea that brain plasticity is largely reduced in adolescents and adults led some authors to propose the existence of critical periods or sensitive time windows for acquiring native-like competence in a second language, especially with regard to phonological and morphosyntactic aspects (see for a review, Birdsong 2006).3 In fact, this early idea about the limits on adult brain plasticity might explain the separation of the infant and adult learning research traditions. First, language acquisition has always been considered a kind of singular implicit process. Its development is circumscribed to a very narrow time window (the first 2–4 years) and characterized by a large amount of brain plasticity that allows infants to master one or multiple languages in a relatively short period of time. In contrast, second language acquisition in adulthood has always been characterized as a non-automatic, explicit and effortful process, clearly modulated by motivational and emotional factors, and comprising a rather crystallized cognitive system with no conceptual changes required.
Surprisingly, some studies of ultimate attainment have shown that an important number of second language learners acquire a near-native performance even in phonology and syntax (Flege 1987; White & Genesee 1996; Bongaerts 1999; Hyltenstam & Abrahamsson 2000; Birdsong 2006). Birdsong (1999) has estimated that between 5 and 15 per cent of learners attain near-native performance. Montrul & Slabakova (2003) and White & Genesee (1996) showed larger estimates ranging from 20–30%. Interestingly, Coppieters (1987) suggested, in view of the results from extensive interviews, that native and near-native speakers of French appeared to be comparable in terms of language use and proficiency, although the two groups clearly diverged in their interpretation of sentences involving basic grammatical contrasts. All these data cast some doubts about a rigid interpretation of the sensitive time window hypothesis (see for a critical view of the critical period hypothesis in language, Seidenberg & Zevin 2006). In a similar vein, during the last decade, the concepts of neural plasticity, neurogeneration and brain repair have been carefully redefined and the emergent picture of the adult learning brain is more dynamic, open and encouraging than the previous views (Buonomano & Merzenich 1998; Bruer 2003; De Felipe 2006). However, it is important to bear in mind that similar performance levels (e.g. when comparing near-native second language learners and native speakers or even infant and adult learning rates) do not directly inform about the implication of the same cognitive resources or processes. The relationship between performance and cognitive processes is always complex, and it is at this point that the use of complementary information from functional brain imaging and connectivity would be particularly helpful.
Complementary, and at the level of cognitive processing, second language research has also been focused on the identification of individual differences in cognitive abilities related to second LL. For instance, four main abilities have been pointed out by several authors: (i) phonemic encoding ability, (ii) grammatical sensitivity, (iii) inductive LL ability, and (iv) associative memory (Carroll 1993). Another aspect that has also been highlighted by other authors is cognitive control, a central aspect in bilingualism (Rodriguez-Fornells et al. 2006). In particular, Bialystok & Sharwood-Smith (1985) introduced the difference between knowledge and control, which refers to the speed in acquiring control over a second language. This distinction is analogous to the explicit–implicit difference in the sense that the slow, effortful-attention demanding, error-prone and feedback-dependent initial process should progressively be replaced by a non-conscious, easier, automatic, fast, errorless, non-feedback-dependent performance (DeKeyser 1997). This skilled learning process resembles the procedural learning observed in other domains (e.g. motor learning). In addition, individual differences in working memory have also been explored carefully in relation to non-word repetition and further vocabulary learning (Baddeley et al. 1998).
Finally, and after this general overview, one may return to the question of the degree of overlap in the learning mechanisms involved in infant and adult LL. In this sense, one can conceive the application of a similar research programme to the one instigated by the ECM (Hollich et al. 2000) with the aim of contrasting the interaction of multiple available cues and learning mechanisms over different stages of acquisition in infants and adults. In this respect, the human simulation paradigm (Gillette et al. 1999; Snedeker & Gleitman 2004) provides an interesting framework to interrelate infant and adult learning. These experiments are conceived as ‘simulations’ in which an adult learner is exposed to information of the kind naturally received by the infant learner being simulated. The underlying objective is to observe how well the adult simulation emulates the real child learning. Parallel findings in infants and adult second language learners would dismiss the possibility that the effects observed are due to limitations of immature cognitive mechanisms during the period of life in which infants are evaluated (Gillette et al. 1999). From this perspective, only a clear description of the cognitive resources of infants and adults, complemented with a developmental viewpoint and the input circumstances that trigger them, will allow determination of the exact learning mechanisms involved in the mastery of language. With this aim, we have adopted a similar approach in the experiments reviewed here.
3. Speech segmentation and word recognition
(a) Speech segmentation in infants using statistical learning
One of the first mandatory stages that infants and second language-learners encounter when acquiring a language is to identify (segment) the units (words) that compose the speech signal (the segmentation problem). The difficulty in segmenting the speech signal into words is accentuated by the lack of clearly marked word boundaries. It is not until a certain degree of familiarity with the language is gained that learners begin to recognize possible words from the speech input. Eventually, parsing the speech stream is possible by exploiting different sources of information. Once these units are identified, they should be mapped onto conceptual representations (the word-to-world mapping problem). The present section is devoted to understand the first process: how language learners are able to segment a new language.
Behavioural studies have pointed out the importance of a variety of different types of cues exploited by infants during speech segmentation tasks (Jusczyk 1999; Kuhl 2004). For example, it has been shown that, among other cues embedded in the speech signal, both infants (as early as 8 months) and adults are sensitive to the distribution of phonological and acoustic regularities and can exploit this type of information to segment the continuous speech signal into word-like units (Saffran et al. 1996). This important learning mechanism has been coined as statistical learning. In more detail, learners are sensitive to the fact that low transitional probabilities are found at word boundaries (low likelihood of one syllable following another), whereas high transitional probabilities are found within words. Statistical learning is considered a domain-general learning mechanism implicated not only in speech segmentation (Saffran et al. 1996) but also in diverse sequential learning situations such as artificial grammars, tone sequences or visual patterns (see Saffran et al. 2006).
Noteworthy, the majority of studies dealing with statistical learning and speech segmentation have not addressed the important question regarding the nature of the output of the speech segmentation process. Concerning this issue, Saffran (2001) proposed that the ‘representations emerging from statistical learning may serve as candidate lexical items for infants, available for integration into the native language’ (p. 9). In this study, the authors demonstrated that infants processed the newly segmented words differently as compared to the words that have not been segmented (non-words) when they were presented at the end of meaningful sentences. Furthermore, Graf et al. (2007) evidenced that 17-month-olds showed an advantage in mapping segmented words to new meanings compared to non-words. Overall, these two studies argued in favour of the existence of a special proto-lexical status attained by the newly segmented words. Conceptual information can then be linked to the proto-lexical traces already stored making the mapping-to-meaning process easier.
(b) Time-course studies of speech segmentation
Because it is difficult to directly measure the learning processes during speech segmentation relying exclusively on behavioural measures, ERPs have recently been used to investigate the time-course underlying speech segmentation in adult listeners (Sanders et al. 2002; Cunillera et al. 2006, 2008). Owing to their excellent time resolution, ERPs and magnetoencephalographic (MEG) techniques are able to inform about online covert processing, which is particularly relevant when studying fast learning processes.
In the first speech segmentation study, Sanders et al. (2002) reported evidence that segmentation from a continuous speech stream of previously taught non-sense words elicited larger amplitudes in the N100 and N400 components after training, a finding that is in agreement with the interpretation of the N400 as an index of lexical search. The N400 component has been classically related to lexical and semantic processing (Kutas & Federmeier 2000). In this regard, this study adds to the previous evidence of the proto-lexical status of the non-words in speech segmentation studies (Saffran 2001; Graf et al. 2007).
In a similar study (Cunillera et al. 2006), we focused directly on the online segmentation process in which non-sense words were discovered during exposure to 8 min continuous auditory streams without any previous training (figure 1a). By tracking the variation in the N400 and N100 components through the learning process, we were able to observe how quickly listeners were able to segment words in a new language. By analysing the learning process in 2 min blocks, we observed that frontocentral differences in the N400 component arose during the second minute of exposure to the language stream compared to a baseline condition in which syllables were presented in a random order, thus ensuring that statistical learning was not possible (see figure 1b). The maximum effect appeared between the 2 and 4 min of exposure (figure 2c; see also De Diego-Balaguer et al. 2007; Cunillera et al. 2009).
Similar results have been observed for infants' word-learning skills where an ERP negativity modulation with a frontal distribution was also found in 14-month-old infants performing a fast learning object-word mapping task (Friedrich & Friederici 2008). Likewise, increased frontal and sustained negativities were observed by Conboy & Mills (2006) for known versus unknown words. Moreover, in a recent ERP training study, Mills et al. (2005) showed that 20-month-olds had a larger N400 to trained words paired with an object compared with untrained words. In adults, the involvement of the N400 in fast word learning has also been reported in other studies addressing different aspects of lexical acquisition (McLaughlin et al. 2004; Perfetti et al. 2005; De Diego-Balaguer et al. 2007; Mestres-Missé et al. 2007; Mueller et al. 2008a).
Thus, overall, in adult and infant studies, negative polarity increases in the range of 200–500 ms appear to reflect the word-learning process. But what type of cognitive mechanisms are these ERP modulations reflecting? In the Cunillera et al. (2006) study, the N400 modulation was seen to reflect learners' ability to extract co-occurrence statistics found within a language. Based on their results with streams of tones built as an analogy to Cunillera et al. (2006), Abla et al. (2008) have proposed that the N400 modulation reflects the computation of transitional probabilities, whereas the N1 does not vary as a function of this factor. However, an alternative view would relate the N400 amplitude modulation to the progressive enhancement of a proto-lexical memory trace for the repeatedly encountered new word. The results of a recent study sustain this interpretation because this progressive amplitude modulation appears as a function of exposition even when segmentation using statistical information was not necessary, because words were pre-segmented by subtle pauses (see figure 2a; see also De Diego-Balaguer et al. 2007). In contrast, the N1 is attenuated when pauses are inserted even in random streams and enhanced for the detection of boundaries based on transitional probabilities (De Diego-Balaguer et al. 2008a), suggesting that it is this component that is likely to reflect the computation of transitional probabilities.
These different electrophysiological studies have also permitted the observation that these learning modulations in the range of the N400 component differ from the lexical-semantic effects observed for the N400 component. While, for the latter, the amplitude of the component decreases when lexical-semantic integration demands are reduced (e.g. with repetition or semantic contextual priming), the learning-related N400 shows the opposite pattern, with progressive amplitude enhancement as a function of increased exposure to the new word (Mueller et al. 2008a). In addition, the topographic distribution of this learning-related N400 component is more frontocentral, whereas the classical semantic N400 typically shows a right centro-parietal distribution (Kutas & Federmeier 2000). These differences in the amplitude modulation and topography indicate that, although these variations are observed in similar latency windows, they might not share the same cognitive processes and neural generators. Longitudinal word-learning designs will allow disentangling these differences and determining the cognitive processes shared for both word processing and long-term consolidation of newly acquired words. Several studies have already shown interesting effects of long-term training of new words as well as the effect of sleep in their consolidation (see Gaskell & Dumay 2003; Cornelissen et al. 2004; Grönholm et al. 2005; Dumay & Gaskell 2007; Tamminen & Gaskell 2008; Davis et al. 2009).
Finally, because speech segmentation to bootstrap linguistic regularities could be considered a crucial step in the chain of language development processes, it might be possible to observe effects of speech segmentation capacities in language development over the long term. However, studies on this issue are scarce. Newman et al. (2006) recently observed that infants' performance on speech segmentation tasks before 12 months of age was related to expressive vocabulary at 24 months. In addition, those children who were able to segment words from fluent speech scored higher on language measures but they did not differ in generalized intelligence. Moreover, early grammar development might be affected by this process as well if words have to be segmented first before the structural dependencies between words can be extracted (De Diego-Balaguer et al. 2007). If the two types of information can be extracted in parallel or the type of computations needed are different, segmentation and grammar acquisition might not show a dependency and may not appear sequentially, one type of knowledge after the other. Overall, these results suggest that speech segmentation abilities might be an important prerequisite for successful language development. In addition, Newman et al. (2006) emphasize the importance of evaluating prelinguistic skills and cognitive development using longitudinal studies.
(c) The role of attention in rule learning and speech segmentation using multiple cues
In natural language, multiple cues converge at the same time and can be exploited in order to segment the acoustic signal into word-like units (Hollich et al. 2000). Because a single cue in isolation is often not fully reliable, the combination of multiple probabilistic cues could facilitate infants and adults in the initial LL. Indeed, during language development, children are very sensitive to metrical sequences, which can help to extract regularities for grammatical acquisition (Jusczyk 1999). In an influential study (Peña et al. 2002), the importance of prosodic information to trigger the appropriate computations for the extraction of rules from language has been proved experimentally. They used a simplified artificial language, which contained words sharing a structural dependency: the first syllable of a word determined its syllable ending (e.g. paliku, paseku, paroku) similar to some morphological rules (e.g. unbelievable, untreatable, unbearable) (see figure 2a). They compared performances with the same material presented continuously or with subliminal pauses between words (25 ms pauses). While participants were always able to extract the specific words composing the language, they were only able to generalize the embedded rule when prosody given by the pauses was inserted. This interesting result raises an exciting follow-up question concerning how this prosodic information is triggering the appropriate computations, or, in other words, how this information changes the way the speech signal is treated.
In a recent study (De Diego-Balaguer et al. 2007), we used artificial languages like those in the Peña et al. study (with pauses) in order to tease apart the electrophysiological responses related to word learning from those associated with the extraction of rules. During the course of learning, adult volunteers exhibited a progressive positive deflection peaking around 200 ms after word onset (P2 modulation) that positively correlated with the listener's ability to generalize the rules embedded in a language. Comparing the variations of the ERP components through the learning process, we could observe that this P2 effect was clearly dissociated from the N400 modulation that appeared earlier in the learning phase (figure 2b).
The present results in relation to the P2 component were interpreted considering the effect of attention in biasing LL processes. This attentional bias was initially proposed by Gleitman & Warner (1982) and Echols & Newport (1992), who considered that infants might utilize certain perceptual or attentional processes that allow them to extract salient elements from the stream of language, leaving some elements unattended and reducing the scope of the segmentation word-learning problem. In adults, Ellis (2008) has proposed that in second LL and particularly for the acquisition of grammatical relations, attention is tuned to enhance the perception of the relevant information. While similar structural patterns may help transfer from L1 to L2, interference will be observed when the new language requires a differential allocation of attention to structural relations. In support of the relation between the attentional bias and the P2 modulation, several studies have showed enhancement of this component for salient stimuli that cued the selection of relevant information (Luck & Hillyard 1994). In the same vein, the P2 modulation was also observed when multiple cues (stress and statistical information) were used and integrated for word segmentation in the same artificial languages described in the previous section (Cunillera et al. 2006, 2008; figure 2c).
This point is crucial if we consider that a number of studies have evidenced that successful extraction of the underlying structural relations requires the presence of cues that could capture learners' attention. Attention might help to select the relevant information that has to be clustered. These cues include, for example, the presentation of clearly segmented words (Gomez & Maye 2005) and the salience of the syllables carrying the critical rule information either by their boundary position (Endress et al. 2005) or by increasing the variability of the irrelevant information (Onnis et al. 2008). Similarly, in our studies, prosodic information (word-stress for segmentation and pauses for rule-learning) could act as task-relevant salient information that helps to capture attentional resources. As proposed by Mueller et al. (2008b), prosody may guide learning by helping to focus on the relevant units. This is consistent with our data from neurodegenerative patients (Huntington's disease (HD)) showing a correlation with performance on different neuropsychological tests of executive function for rule-learning performance but not for word learning when the words presented are pre-segmented (De Diego-Balaguer et al. 2008a,b).
In summary, taking the different studies together, attention seems to play a crucial role in the segmentation and rule-learning process (Toro et al. 2005; Pacton & Perruchet 2008). Several accounts have proposed that given the power and usefulness of statistical learning, the same computations applied to the speech input can allow learners to segment and also to extract the rules governing grammatical relations (Pacton & Perruchet 2008; Ellis 2008). These proposals agree on the relevant role of attention to focus on the relevant units (words, clauses, phrases, grammatical categories) where calculations have to be applied. However, the difference here might not reside in the distinction between words and rules but rather in the use of multiple cues to be integrated versus the use of only one type of information, which is in agreement with the ideas underlying the Hybrid model of Hollich et al. (2000).
4. Inferring the meaning of new words using contextual learning
(a) Contextual learning in second language research
In a second series of studies, we aimed to simulate the mapping of existing meanings to new words using information from verbal contexts. This type of learning via guessing is a powerful mechanism that permits the discovery of the meaning of new words throughout the lifespan. This is the case for first and second LL, if learners experience the appropriate conditions (Nation 2001). Indeed, contextual learning could be considered an example of how a general learning mechanism, inductive reasoning, is required for the purpose of LL.
It has been estimated that students in the middle grades encounter between 16 000 and 24 000 new words (Nagy et al. 1987; Nation 2001). Although the estimation of the number of words learned per day differs across authors, a typical child (e.g. 8–10 years of age) might have to learn about six to 12 new words per day. Although it is supposed that many of these words would be learned using contextual information during reading (Durkin 1979), some studies have shown that this ability to derive the meanings of new words from contexts might be more difficult to attain than initially thought (Carnine et al. 2008; see for critical revisions Carver & Leibert 1995; Landauer & Dumais 1997; Laufer 2003). In order to extract the correct meaning, learners should selectively focus their attention on the relevant conceptual information of the context to correctly guess the meaning of the new word. This ability may depend on individual differences in verbal reasoning and working memory, which might help to ‘pick up’ those relevant aspects of the contexts that could provide the clues to the meaning of the new word (Sternberg 1987; van Daalen-Kapteijns et al. 2001). Interestingly, Chaffin et al. (2001) studied the inference of meaning from context using eye-movement recording. These authors found that the amount of reading time observed for informative regions of the context was larger than in neutral or non-informative ones. This result implies that participants were able to quickly adapt their reading strategy depending on the relevance of the information for contextual word learning.
Some authors have proposed that the way in which the different studies on child language evaluated the degree of learning of new words may not be appropriate and that children accrue only partial semantic knowledge for the new words that will be filled and completed with additional exposures (Landauer & Dumais 1997; McGregor et al. 2002). This ‘slow mapping’ process is thought to occur in children between several new words and their corresponding referents and meanings. Although this idea contrasts with the well-known fast mapping process (Carey & Barlett 1978), slow mapping may be triggered subsequently to the fast mapping process. Initially, a fragile new word representation might be created in the lexicon and the child could begin to hypothesize about its meaning, updating this semantic representation until it perfectly maps the relationship between the word, the referent and its related concepts.
From this perspective, word learning is considered to be an incremental process in which word representations are progressively developed and refined over time through multiple exposures (Bloom 2000; McGregor et al. 2002). Interestingly, this incremental learning process should be susceptible to learning and forgetting of various semantic attributes, in the sense that further encounters with a new word might reject some of the false conceptual attributes initially guessed or attach new ones (Wener & Kaplan 1952; McKeown 1985). This initial grasp of the meaning of a novel word might aid its placement in semantic space by indicating its similarity to already existing and established lexical entries. Then, readers might selectively allocate their processing resources to each region of the sentence depending on the semantic hypothesis generated and the relevance of the amount of pertinent information available in that region (see Morris & Williams 2003).
(b) Time-course analysis of contextual learning
Contextual learning tasks provide a good measure of the ability to extract particular components of a word's meaning as well as the ability to differentially select the right set of semantic components from the vast amount of potential relational information across the sentences. Using a beautiful figurative analogy introduced in Nagy and Gentner's contextual learning study (second experiment): ‘This experiment might be likened to dipping a magnet into a mixture of iron filings and sand: the iron fillings should stick and the sand should fall off’ (Nagy & Gentner 1990, p. 188). Similarly, the appropriate semantic features of the context have to be attached to the magnet, which might largely depend on the grammatical class of the new word.
To study how word meaning is online determined from reading information, we devised a word-learning task that mirrored a reading situation in which three-sentence contexts constrained or did not constrain the meaning of a new word (Mestres-Missé et al. 2007) (see table 1). In constrained coherent contexts (new word meaning condition), we created a learning context in which the three sentences referred to the same meaning, while in the other condition (new word no-meaning), the sentences referred to different concepts and participants could not infer the meaning of the new word. In the self-paced reading and ERPs experiments, we observed a very similar pattern of gradual acquisition of the meaning of the new word (see figure 3a,b). While clear differences were observed in the first sentence for new word conditions against a control real word condition, this difference gradually disappeared across the next two sentences between the new word meaning condition and the real words. New words in coherent contexts developed a gradual increase of the N400 that was not observed for new words embedded in incoherent contexts. Interestingly, at the end of the second sentence, a significant difference was already present at the N400 range between new words in coherent and incoherent contexts (figure 3b). This pattern of reading times is also in consonance with other studies where readers tend not to spend more time on a particular noun when it is referred to a previously, but not explicitly, mentioned concept in the sentence (O'Brien et al. 1988). In this regard, the lack of differences between the new word meaning and real-word condition, at the end of the third sentence, is in agreement with the idea that the concept was already inferred at that position.
In summary, these experiments could be interpreted in accordance with the incremental learning idea that the semantic attributes of the new word are gradually built into the semantic space.4 In the same line, Borovsky et al. (2008) studied ERP responses to subsequent plausible and implausible usages of a new word that had been previously presented in a one-sentence exposition (weakly or strongly constrained). A reduction of the N400 was observed when the new word was introduced in a new plausible sentence compared with the implausible one, but only when the pre-exposition was in a strongly constrained sentence. These results show that a single exposure to a novel word and its context might be enough to influence the ability to judge a word's appropriate usage in subsequent contexts, but only if the first context where the novel word was encountered is highly informative.
In agreement with this idea, we further observed that the association of the new word with its meaning showed N400 semantic priming effects (see figure 3c; see similar N400 priming findings in Perfetti et al. 2005). Interestingly, the latency of this priming effect was delayed (about 150 ms) compared with the priming effect in real words and the distribution of the N400 effects was more frontocentral than the effect observed for real words (figure 3c, right). One of the explanations for these differences could be that the new conceptual relations attached to the newly learned word are still weak and, therefore, an increase in cognitive control might be required to guide semantic knowledge retrieval and selection (Krashen 1982; Bialystok & Sharwood-Smith 1985; Kroll & Steward 1994; Bialystok 2001; Rodriguez-Fornells et al. 2006).
5. A functional neuroanatomic model of ll: an integrative account
The present section proposes an integrative brain functional anatomic account of (figure 4) word learning based on three main ideas: (i) the dual-stream model of language processing (Hickok & Poeppel 2000), (ii) the role of the medial temporal lobe (MTL) structures in initial storage of information and further consolidation (Nadel & Moscovitch 1997), and (iii) the recruitment of several brain regions involved in cognitive control processes during second LL (Krashen 1982). The model is partially based on previous neuroimaging data gathered using the LL paradigms exposed in the previous sections (mostly speech segmentation and contextual word learning). Other parts of the model and specially the neuroanatomical interconnectivity of the different regions involved in LL are based on previous studies in which these issues have been addressed (mostly from neuroanatomy, neurophysiology, diffusion tensor imaging (DTI) and intraoperative electric stimulation studies in humans and other species). In this regard, parts of the model remain speculative and have to be considered as heuristics that can stimulate and guide future research on LL and its functional and structural connectivity.
As depicted in figure 4, a learning context is considered as an experience in which a learner is exposed to one or more unknown items (including multiple linguistic and extra-linguistic cues). This information or parts of it could be repeatedly exposed. From this context, multiple and parallel processing streams might be triggered for (i) encoding the phonological representation of the new item, (ii) shortly retaining this representation and manipulating it in short term memory, (iii) long-term consolidation of this representation, and (iv) attaching conceptual representations to this form or inferring the meaning conveyed in the learning context.
We briefly outline in the following sections the importance of the three streams in LL: (i) the dorsal audio-motor interface, (ii) the ventral meaning integration interface, and (iii) the episodic-lexical interface in LL processes. We also discuss the involvement of other cognitive functions (e.g. attention, maintenance and manipulation of information, and inductive reasoning) that can aid to the previous processing streams in several LL processes.
(a) Dorsal audio-motor interface
This pathway is involved in mapping sounds into articulatory-based representations (Hickok & Poeppel 2000, 2004, 2007; Scott et al. 2000; Wise et al. 2001; Scott & Wise 2004). It engages the left posterior temporal regions and the parieto-temporal boundary, as well as the frontal regions sustaining motor speech representations (figure 4). In relation to LL, Hickok & Poeppel (2007) have recently put forward the hypothesis that this interface might be involved in the acquisition of new vocabulary. This process might involve generating a new sensory representation of the novel word and keeping this auditory representation in an active state (i.e. phonological short-term memory, Buchsbaum et al. 2005). At the same time, this newly created trace might guide the production of motor articulatory sequences. This proposal converges with the motor theory of speech perception (Liberman & Mattingly 1985). Several recent studies have evidenced the implication of several areas of this dorsal interface in learning new phonological contrasts (Golestani & Zatorre 2004; Golestani et al. 2007; Wong et al. 2007; Mei et al. 2008).
The first evidence in favour of the involvement of this dorsal interface while segmenting speech was published by McNealy et al. (2006). The authors showed that a left frontotemporal network is activated during the online speech segmentation process (see figure 5a,b). In particular, while listening to the language streams, the activation of the superior temporal gyrus (STG) was found to increase over time along with the recruitment of the premotor cortex (PMC) and other regions of the dorsal pathway. In order to gain more insight into the involvement of the audio-motor interface, we recently investigated the specific role of the PMC in speech segmentation. Using the same type of language and random streams as in the ERP experiments (figure 1a), we observed activation in the PMC in the language conditions during speech segmentation (see figure 5c). Besides, we showed that the performance in the word-segmentation task correlated with the activation in the left PMC only during the first 2 min of learning (Cunillera et al. 2009) (see figure 5d). This result is in convergence with the selective involvement of the PMC at the early stages of implicit motor sequence learning (Doyon et al. 2002) and the implication of the premotor and motor cortex in passive speech perception (Fadiga et al. 2002; Wilson et al. 2004; Meister et al. 2007).
As stated by Warren et al. (2005), a possible mechanism for explaining these results would be that a template-matching algorithm in the posterior STG allows for the detection of coincidences between the stored auditory memory representations or ‘phonological templates’ derived from previous exposures and the new incoming stimuli. Then, the output of this process should be an ordered sequence of auditory representations that could be forwarded to the PMC. Based on previous studies using DTI (Catani et al. 2005; Saur et al. 2008) and intraoperative direct stimulation (Duffau 2008), the better candidate to support this communication between the posterior STG and the PMC might be the superior longitudinal fasciculus, most probably its lateral part, which runs parallel to the arcuate fasciculus and passes through the supramarginal gyrus (Catani et al. 2005; Duffau 2008).5 In the premotor areas, the encoded sequence of sounds might be mapped into a sequence of articulatory gestures, which would keep the segmented words active through a rehearsal mechanism. The importance of the rehearsal component of verbal working memory is also evident when participants are asked to learn the same type of artificial language streams while performing an articulatory suppression task. In this condition, which impedes the use of rehearsal, segmentation is blocked when compared to a simple auditory interference condition (Lopez-Barroso et al. 2009). This rehearsal mechanism might be crucial in the first stages of learning unfamiliar new words (Baddeley et al. 1998), reflecting the involvement of the rehearsal component of the phonological working memory when segmenting possible words. Indeed, Scott & Wise (2004) explicitly suggested that this dorsal stream might be very important in language acquisition, highlighting the important role of rehearsal during new words learning.
Using similar ideas as in feed-forward motor control models (Desmurget & Grafton 2000), this interface could allow the creation of an internal efference copy of the possible representation of the articulated word, which might be very useful to the language processing system for prediction purposes. This feed-forward prediction mechanism could be triggered when learning new words, with the PMC acting as a fine-tuning ‘top-down’ mechanism that regulates the template-matching process engaged in the posterior STG. Because of the implication of the premotor and the inferior frontal cortex in action selection, the existence of this type of predictive representations is very plausible as a monitoring device that checks the output before sending it out to the articulators. Similar proposals have also been raised in language production models (Dell & Reich 1981; Dell 1986) or in implicit motor skill learning (Poldrack & Willingham 2006). Notice that tracking of the predictability of upcoming attended stimuli might also be very important in speech segmentation. In fact, it has been shown that PMC responds to the prediction of auditory events in a structured sequence (Gelfand & Bookheimer 2003; Schubotz et al. 2003).
It is worth considering in which degree this auditory-motor learning network could also be involved in language acquisition in infants (Doupe & Kuhl 1999; Warren et al. 2005). Indeed, this auditory-motor interface is postulated to be very important in speech development, because speaking inherently requires fine-tuned motor learning. Language perception and production in the developing infant brain require a specific tuning to the language sounds encountered during the first year of life. First words imitated by a child are guided by the ‘gestural’ features of the sound, i.e. by the actions of the mouth rather than by a sound's acoustic features (Studdert-Kennedy 1987). Importantly, the outlined auditory-motor pathway must also be related to the brain network subserving imitation of simple movements (Iacoboni et al. 1999). In fact, the ability to mimic sounds seems essential for language development. This idea has been revitalized by the discovery of mirror neurons, recorded in macaques in the homologue of the ventral PMC region (including the superior part of Broca's region) and in humans (Rizzolatti & Arbib 1998). These specific audiovisual mirror neurons discharge not only when performing and observing a specific action, but also when hearing a specific sound representative of the observed action (Kohler et al. 2002). Mirror neurons also provide a mechanism for integrating perception and action at the neuronal level, which, at the same time, might contribute to various developmental processes such as the imitative behaviour of infants (and the necessity to integrate perceived and performed actions; Meltzoff & Decety 2003) and communicative acts (Rizzolatti & Arbib 1998).
(b) Ventral meaning integration interface
The involvement of the meaning interface in LL has a broader scope when compared to the previous interface, in the sense that it might be triggered even in contexts in which there are no specific new word tokens to be learned. Let us imagine that in the experiment we performed when inferring meaning (Mestres-Missé et al. 2007; see table 1), instead of a new word at the end of a sentence we had presented an abstract figure. Participants would have engaged the process of meaning inference independently of the nature of the item presented.
We believe that this interface is an endowed mechanism set up to infer meaning by exploiting internal and external sources of information. This mechanism might be triggered when a specific demand is present to infer the meaning of a situation, new word, discourse, etc. or when conflicting conceptual information exists that requires a specific solution. As a self-triggered learning mechanism, internal reinforcement might depend on the capacity to reduce conflict and uncertainty through the correct inference of the meaning of a specific learning context (Berlyne 1960). Notice that this approach is similar to the ideas previously presented in infant learning in relation to the utilization of multiple extralinguistic and pragmatic cues in order to disambiguate LL (Bloom 2000; Hollich et al. 2000; Tomasello 2003). Thus, the intrinsic drive to meaning integration might require the participation of multiple cognitive processes related to semantic and conceptual analysis of the information conveyed in the learning context. A similar mechanism might be required in ambiguous communication environments where full syntactic parsing is not possible (Ferreira 2003) or when processing sentences that contain semantically ambiguous words (Rodd et al. 2005).6
The existence of this interface is in agreement with the proposal of the ventral language stream by Hickok & Poeppel (2007). This stream would be selectively activated in meaningful learning contexts and might involve non-traditional perisylvian regions, comprising the medial, inferior and anterior temporal cortex (i.e. STG and sulcus, angular gyrus, the posterior inferior and middle temporal gyrus (MTG), anterior temporal pole) and the ventral inferior frontal gyrus (vIFG, triangular region—BA45) and more orbital, medial and ventrolateral aspects of the prefrontal cortex. The interconnectivity within this network and other brain regions might be mediated by the inferior longitudinal, the inferior fronto-occipital (occipito-temporal, sub-insular of frontal branch) and the uncinate fasciculus, which connects the anterior temporal and ventral prefrontal regions (Duffau et al. 2005). These pathways radiate out important information from occipital visual regions related to object processing and most probably are involved in the link between object representations and lexical labels (Mummery et al. 1999).7 However, a recent DTI study, alternatively, suggested that the white matter ventral pathway that crosses through the external capsule, conveying information from two different pathways, the middle longitudinal fasciculus and inferior longital fasciculus, might be implicated in the iterative exchange of information between the middle and inferior temporal regions and the ventral prefrontal cortex required for accessing lexical, semantic and conceptual representations (Saur et al. 2008). Indeed, these authors hypothesize that this pathway might be crucial for infants to derive meaning and construct conceptual knowledge.
In relation to the functional role of the prefrontal regions in semantic processing, there is agreement in the literature about the role of these regions in selection or controlled semantic retrieval (Badre & Wagner 2002; Gold et al. 2006; Thompson-Schill et al. 2006). For example, the anterior left IFG (pars triangularis, BA 45 and pars orbitalis, BA 47) has been associated with elaborate semantic processing (Petersen et al. 1988; Demb et al. 1995; Ferstl et al. 2008) and alternatively, selection from competing semantic features (Thompson-Schill et al. 1997).
The meaning inference paradigm (table 1) provides a good opportunity to evaluate the implication of this interface in LL. In Mestres-Missé et al. (2008), we observed that new word meaning from a verbal context selectively recruited several brain regions of this stream, most importantly the left anterior IFG and the left MTG. Besides the right and left parahippocampal gyrus, anterior cingulate gyrus and several subcortical structures (e.g. bilateral thalamus, bilateral caudate and left putamen) were also activated (see figure 6a,b).
The large activation observed in the MTG might reflect the activation of stored conceptual information that is necessary to infer the meaning of the new word from the context. This idea of the access of semantic information in MTG is in agreement with the proposal that this region might be a supramodal semantic processing region (Vandenberghe et al. 1996; Price 2000; Lindenberg & Scheef 2007; Patterson et al. 2007) involved in (i) the storage of long-term conceptual knowledge, segregated from word-form representations (Petersen et al. 1988; Martin & Chao 2001), (ii) lexical-semantic processing (Damasio et al. 1996; Ferstl & von Cramon 2001; Keller et al. 2001; Baumgaertner et al. 2002; Dronkers et al. 2004; Indefrey & Levelt 2004), (iii) the activation of visual forms and word meanings (Howard et al. 1992; Pugh et al. 1996; Hagoort et al. 1999), and (iv) increased semantic integration demands (Damasio et al. 1996; Just et al. 1996; Ni et al. 2000; Baumgaertner et al. 2002). In the novel word conditions, a set of initial candidate semantic features might be activated based on the information conveyed by the first sentence, and this primed semantic space might be narrowed down during the second and third sentences or more expositions of the new word in different contexts. This process of zooming into the semantic space of the new word is mediated by the interplay between the MTG and the ventral IFG, the last area involved in guiding semantic selection/retrieval processes via top-down modulations (Badre & Wagner 2002; Gold et al. 2006; Rodd et al. 2005). However, this mechanism of semantic selection and retrieval might require the monitoring of conflict between candidate meanings or lexical items pre-activated in the semantic network and the final selection of the best fitting candidate concept. This monitoring and selection of the final lexical candidate is probably also mediated by the anterior cingulate–striatum–thalamic loop (see below).
Finally, inductive reasoning might also be an important process aiding this ventral stream of meaning integration. As proposed by Gentner & Namy (2004), analogical reasoning might underlie the comparison processes that are triggered when multiple experiences with specific commonalities and inherent abstract relations are presented in LL. As outlined in figure 4, we have included the middle frontal gyrus (MFG) (BA10) and more posterior dorsolateral regions (encompassing BA46 and part of BA9) as non-specific regions aiding LL processes, which could also be activated for example during rule learning (Goel & Dolan 2000; Strange et al. 2001), or when it is necessary to infer the abstract relations between certain elements. Notice that this region was also activated in the speech segmentation studies (figure 5a,b) (see also McNealy et al. 2006) and therefore, it might provide a high-level cognitive structure involved in inductive and abstract reasoning, fluid intellectual processes and more general problem solving (Gray et al. 2003). These regions might also be involved in the integration of relationships among various stimulus dimensions (e.g. analogy, induction, controlled episodic retrieval processes), which have been maintained and accrued in the working memory system (at more posterior prefrontal regions, see Chein et al. 2003). Indeed, inductive reasoning shows larger activation in dorsolateral regions when compared to deductive reasoning (Goel & Dolan 2004). Corroborating this idea, in a recent study on meaning inference of verbs and nouns (Mestres-Missé et al. in press), we observed that those participants who inferred more meanings from the context showed larger activation in these left anterior prefrontal regions (BA10/46) (figure 7a).
(c) Episodic-lexical interface
One important aspect of word leaning is that the meaning of a new word can be guessed or mapped with very few presentations (Carey & Barlett 1978). This fast mapping process has also been observed for learning facts and is present in infants and adults (Markson & Bloom 1997), which speaks in favour of a clear preservation of these mechanisms across the lifespan. In opposition to classic associative learning mechanisms that require multiple expositions, this type of fast (single-trial) learning has been traditionally assigned to declarative memory which relies on the MTL region, including the hippocampus, the parahippocampal, the entorhinal and the perirhinal cortex. Owing to the rich connections existing in non-human primates between the superior temporal regions and the polysensory MTL (Lavenex et al. 2002) (see figure 4), the possible participation of these structures in the initial fast-mapping of new words and their meanings could be very plausible. The involvement of the MTL region in the acquisition of new lexical knowledge has also been proposed in the declarative/procedural model by Ullman (2001). Furthermore, the implication of the MTL structures and, in particular, the hippocampus in processing novelty, specifically and novel verbal stimuli, is well known (Saykin et al. 1999; Strange et al. 1999).
These structures have been traditionally considered to be implicated in the storage of episodic memories, although the mechanisms involved in he long-term consolidation of these memory traces have been under dispute (McClelland et al. 1995; Nadel & Moscovitch 1997; Squire et al. 2004). An interesting aspect to consider is to what degree new words, which will become in the long run integrated and represented in the mental lexicon (L1 or L2) with the corresponding links to conceptual representations, will require the same type of learning processes as for other type of episodic traces (e.g. autobiographical events). For example, it has been proposed that during speech perception, detailed episodic traces of spoken words and non-words are created and remembered for considerable periods (Goldinger 1998). Besides, the famous patient H.M., characterized with anterograde amnesia following bilateral removal of the hippocampus together with the entorhinal and part of the perirhinal cortices, was impaired in the acquisition of new lexical information after the lesion (e.g. the word ‘xerox’, Gabrieli et al. 1988), but not in general lexical and grammatical processing tasks (Kensinger et al. 2001). Based on this study and similar ones, it has been proposed that the MTL region is involved in the acquisition of new semantic information (Bayley & Squire 2005).
However, this interpretation is in conflict with the study reported by Vargha-Khadem et al. (1997) and more recent studies in which it was observed that indeed H.M. acquired new factual information (e.g. names of people who became famous after the onset of his amnesia) (O'Kane et al. 2004). In the study of Vargha-Khadem, three young amnesic patients with selective hippocampal damage early in life were examined and, despite their episodic memory deficits, these patients showed nearly intact language competence, literacy and general factual knowledge. The authors proposed that the hippocampus plays a selective role in the creation of episodic memories, whereas the surrounding cortical areas (entorhinal, perirhinal and parahippocampal gyrus) could support the acquisition of new lexical and semantic information, even in the absence of the hippocampal area (see also Bayley & Squire 2005; De Haan et al. 2006). In favour of his differential participation of the MTL structures in semantic and episodic memory, Davies et al. (2004) showed that a group of patients with semantic memory impairments (Semantic Dementia) had less volume in the left anterior temporal pole, perirhinal cortex and anterior entorhinal cortex, while patients with Alzeihmer's disease showed marked reductions in the hippocampus and the posterior enthorhinal cortex.
In the model presented in figure 4, we adopt the idea presented by other authors that the initial binding of new word representations into a memory trace should be MTL dependent (Ullman 2001; Squire et al. 2004; Gaskell & Davis this issue). However, considering Nadel & Moscovitch (1997), we propose that the long-term consolidation process would be different for episodic (context dependent) or linguistic traces (context free). In the case of episodic events, the additional rehearsal of a specific trace (which could be exclusively achieved through pure neocortical traces or by relying on MTL-neocortical synaptic connections) will enhance the spatio-temporal context of the event or trace. In contrast, further encounters with the newly acquired words and its rehearsal will gradually decrease the spatio-temporal context trace in which the new word was presented and will enhance its associations to conceptual information (becoming a context-free trace). The differences in both the rehearsal and the long-term consolidation processes will be due to the specific requirements of the linguistic traces. While the efficiency of a consolidation process in episodic autobiographical memory depends on the capacity to correctly frame a trace as accurately as possible into a specific context, for a new word trace, the emphasis will be on the association of the new word with its meaning or the conceptual features that define it.
This trade-off in the MTL system between context rehearsal and conceptual rehearsal might explain in the long run the apparent division of declarative memory into episodic and semantic memory in humans. It is worth mentioning that a similar idea on how memory representations might change with the passage of time, becoming more semantic/fact like (context free) and less episodic (context dependent), has been proposed by Bayley et al. (2003). The authors proposed this idea in order to explain how autobiographical memories become independent of MTL structures and also some observations in amnesic patients in which autobiographical memories became integrated in a type of semantic-like representation and totally disconnected from their original spatio-temporal context (see also Cermak 1984).
Considering this proposal and previous neuroanatomic differences in semantic and episodic memory (Vargha-Khadem et al. 1997; Kensinger et al. 2001; Davies et al. 2004; O'Kane et al. 2004; Bayley & Squire 2005; De Haan et al. 2006), a prediction from our LL model would be that the hippocampus and the entorhinal cortex (posterior sides) might be more involved in the initial stages of word learning, while further exposures and consolidation processes would recruit more anterior entorhinal sides, perirhinal and parahipaccampal MTL regions. This time-dependent recruitment of the regions encompassing the MTL system would support the hypothesis of the trade-off in the MTL system between context rehearsal and conceptual rehearsal when storing new words and their meanings. Furthermore, this idea is also in congruence with the somehow preserved ability to store new factual information in amnesic patients, at least when sufficient exposure and repetitions to this information is available (as it might be the case for famous people or events) (O'Kane et al. 2004). As suggested by several authors, the storage of this new semantic information might depend on preserved parahipocampal and perirhinal cortex (Varga-Khadem et al. 1997) which might allow a more gradual and slow storage of context free information. Indeed, discovering the meaning of a new word might depend on the selection of the common conceptual attributes present in different contexts (learning experiences), and therefore, these spared regions in the MTL region (parahipocampal and perirhinal cortices) might be responsible for this type of slow and more gradual learning process. Interestingly, when damage to the MTL region extends to the perirhinal and partially to the parahipocampal cortices, no evidences of learning of new semantic information have been shown in a densely amnesic patient (Bayley & Squire 2002, 2005).8
Several studies related to word learning have already provided compelling evidence for the involvement of the MTL (hippocampus and parahippocampus) regions in word learning (Breitenstein et al. 2005; Mestres-Missé et al. 2008; Davis et al. 2009). In our previous study on incremental word learning through contextual information, we observed larger activation in the anterior portion of the parahippocampal gyrus when compared to the incoherent condition. The involvement of this region in this type of more incremental and semantic-based word learning is in agreement with the ideas previously exposed (Varga-Khadem et al. 1997; O'Kane et al. 2004; Bayley & Squire 2005). Besides, larger activation was encountered in the incoherent condition (where no meaning could be derived for the new word) in the left posterior parahippocampal gyrus (Mestres-Missé et al. 2008; see figure 6b). We interpreted this dissociation between anterior and posterior parahippocampus with the proposal that anterior regions within the medial-temporal lobe are predominantly involved in encoding, whereas the posterior regions subserve retrieval (Lepage et al. 1998; Saykin et al. 1999). Furthermore, in a more recent study and using a larger sample, we observed a clear correlation between successful meaning extraction of new words and the hippocampal activation (Mestres-Missé et al. in press) (figure 7b). Participants with larger word-discovery rate showed larger activations in the right and left hippocampal regions. These data and previous studies (Breitenstein et al. 2005; Davis et al. 2009) confirm the implication of the hippocampus in the initial stages of learning a new word. Interestingly, further studies might be required in order to understand in which degree these individual differences observed in word learning are also dependent on the underlying white-matter pathways that interconnect these MTL regions. For example, in a recent behavioural-DTI study, Fuentemilla et al. (2009) showed that individual differences in the amount of recognition and recall of previously presented words were associated with the microstructure of the inferior longitudinal fascicle, the major white-matter connectivity pathway of the MTL, extending from the ventral and lateral temporal regions to the posterior parahippocampal gyrus (Schmahmann et al. 2007).
(d) Interaction between the three LL streams of processing
It is interesting to understand to what degree the three interfaces might interact among them and what learning circumstances might trigger their cooperation. For example, an open question (see §3) is what might happen in the case that a new word is learned (e.g. as an output of speech segmentation), but it has not been associated to any meaning. The status of this trace will depend on the internal connections between the posterior STG, the phonological-articulatory trace (PMC) and the MTL. In fact, lexical representations might be stored in the brain separately from the MTL structures, and the new word might interact with the existing representations (phonological similarities, etc.) (Clay et al. 2007).
The interrelation between the meaning interface system and the episodic-lexical interface deserve also some attention. For example, it is possible that the engagement of inference reasoning at anterior prefrontal regions might be inversely related to the activation of MTL regions. The process of inference does not require intact storage of the elements presented because it privileges the similarities, not the uniqueness, of the items encoded. For example, Goel & Dolan (2000) showed that inductive rule-learning processing showed an inverse correlation between medial frontal activation and hippocampal activation when encoding new-animal tokens. The balance between the MTL and MFG regions could explain the trade-off between processing similarities and inferring rules versus encoding specific tokens or words. Indeed, LL individual differences could depend on the weigh of the learner to process information with one stream or the other.
In this regard, sentence comprehension processes create a type of gist semantic representation that is independent of the specific verbatim lexical representations (Bransford & Franks 1971; Brewer 1977; Graesser et al. 1994). Models of discourse processing assume that we understand language using not only local word-level information but also attending to the wider meaning of the sentential context (Singer 1994). This idea is further supported by memory models that provide evidence for the creation of verbatim and gist-type representations, which might further influence true and false memory recall processes (Brainerd & Reyna 2002). In this regard, the meaning interface is an ideal system to create these gist-semantic representations, which might be integrated and used in anterior frontal regions to infer common relationships and abstract ideas.
A last point when discussing the participation of the MTL structures is the rich interconnectivity that exists with medial diencephalic, ventral striatum and midbrain reward processing circuits in the brain. This interconnectivity between the MTL and this emotion–motivational network could be acting as a feedback rewarding mechanism and might explain the role of emotion and motivation in LL. This is a very interesting venue of future research, especially for contrasting feedback-extrinsic learning in adults versus less feedback-dependent learning in infants (Tricomi et al. 2006).
(e) The integrative role of the basal ganglia
Aside from the direct connections between the three streams, basal ganglia structures occupy a privileged position to hold an integrative function between the different LL streams. The striatum (caudate and putamen) acts as a funnel receiving inputs from different neocortical areas responsible for distinct cognitive functions and sending its outputs back to the cortex through the thalamus forming different functional loops (Middleton & Strick 2000). There is evidence indicating that different substructures of the basal ganglia, including the thalamus, are implicated in executive functions, attention (Couette et al. 2008) and storing and rehearsal in verbal working memory (Chang et al. 2007). These functions are important in the course of learning and the role of these subcortical structures particularly in the acquisition of sequences and categorization is well documented (Seger 2006). Because language is sequential by nature, a preponderant subcortical role has been proposed for the acquisition of the different information from language (Lieberman 2000).
Concerning the different LL streams introduced in this section, areas of the dorsal audio-motor stream (SMG, vPMC and vIFG), the ventral meaning stream (middle and inferior temporal gyri, and vIFG) and the episodic-lexical interface project to the caudate nucleus and anterior putamen. The PMC, involved also in the audio-motor stream, projects to the medial putamen (Leh et al. 2007). Recent evidence indicate that circuits involving the striatum show a great deal of interaction (Yin & Knowlton 2006) reinforcing the idea that this structure may have a key position to integrate inputs from different streams. Because of that, several authors have proposed that the basal ganglia (and thalamus) are engaged in modulating, gating and controlling information flow leading to the selection of appropriate items or behaviours (Haber & Calzavara 2009). For example, in the lexical-semantic interface, they might be responsible for the selection of adequate semantic-lexical items as well as for the inhibition of other candidates (Crosson et al. 2003). In our word-learning study with contextual information (figures 6 and 8), the direct comparison between meaningful and meaningless words yielded differences in the left thalamus. Moreover, a strong correlation was encountered between this region and the percentage of correct meanings reported (see figure 8). The critical role of subcortical structures for the extraction of meaning from context has also been confirmed by the impairment of HD patients with early striatal degeneration (Nogueira Teixeira et al. 2008).
In word segmentation and rule extraction, basal ganglia might be important to maintain the attentional bias to the relevant information as mentioned in §3c. The general cognitive control role fits also well with the results from the extraction of other types of information from speech. HD patients also showed LL difficulties particularly for the acquisition of the rules embedded in the speech stream (De Diego-Balaguer et al. 2008b). This is consistent also with the idea that these cortico-subcortical circuits are necessary whenever control is required, as for example during learning or when ambiguity or violations are present in language (Friederici & Kotz 2003; Wahl et al. 2008). However, more research is needed to clearly disentangle the role of the different structures (striatum versus thalamus) within the basal ganglia function.
6. Concluding remarks
The present word-learning simulations were designed to zoom into the LL processes that are difficult to observe under natural or more ecological learning conditions. With that purpose we concentrated on two problems, the isolation of words and rules in continuous speech (speech segmentation) and the mapping of new words onto existing meanings (inferring meaning from verbal contexts). Both problems were evaluated using complementary behavioural and neuroimaging techniques. We further developed an integrative functional connectivity model of three neurophysiological streams of processing in LL experiences.
We proposed the existence of (i) an audio-premotor interface, which is conceived as an internal audio-motor simulator that could be very important in initial learning of new word phonological forms, (ii) the meaning integration interface, which is envisioned as a mechanism involved in inferring meaning using multiple internal and external cues, and (iii) the episodic-lexical interface, which is in charge of fast-mapping of new words onto specific contexts and long-term consolidation of this new trace into the lexicon(s). The interaction between these systems is modulated by common cognitive control and high-level functions, such as the middle prefrontal cortex involved in inductive reasoning, the striatum–thalamus subcortical circuits involved in the coordination of the different streams and the reward/motivation and feedback processing system. The specific study of these neural circuits and their connectivity, as well as their developmental milestones, might help to understand the sensitive time-windows of each of these LL streams. This information will clarify to what degree adults are able to use these mechanisms for learning new languages. These streams of LL are not proposed to be specific for LL as they might also subserve learning in many other domains, such as music learning.
One important caveat to consider is that a reliable cognitive neuroscience account of adult and infant LL should not be directed exclusively to delineate the neural networks that support these processes. The crucial combination of online techniques (e.g. EEG, oscillatory activity, MEG) provide more subtle measures that allow researchers to understand the dynamics of these networks and the covert cognitive processes involved in learning. The combination of these tools will allow researchers to test empirical questions derived from new integrative models and accounts of the ‘complex learning language-problem’. Besides, LL simulation experiments might also help to understand language processing in general; in fact, one can envision them as stimulation devices or windows into the inner structure of language (see Gaskell & Dumay 2003; Clay et al. 2007; Mestres-Missé et al. 2009).
At the end of the word-learning journey, we will be more prepared to predict which would be the success of an infant and an adult when faced with an unknown language. The study of the neural networks and functional connectivity might have clear implications for clinical language neurorehabilitation of infant and adult language disorders (Cornelissen et al. 2003). Several studies have evaluated the effect of language training using different neuroimaging tools in anomic patients (Cornelissen et al. 2003; Grönholm et al. 2007).
Finally, brain plasticity in the adult brain is a new challenge for cognitive neuroscience. Although we have learned during the last century that brain plasticity was largely reduced in adults, it is also true that new discoveries about neurogenesis and epigenetics in adults might open the possibility that some claims about LL will be slowly changing. An intriguing question is the relationship between LL, brain plasticity and bilingualism. As LL can be considered one of the most demanding cognitive tasks humans can face in a short period of time, it might be interesting to know the impact of LL in brain plasticity and neurogenesis in the adult brain, as well as the long-term effects in terms of neuroprotective mechanisms (Bialystok et al. 2007). What might remain true for the next years is that research on brain plasticity and neuroscience might change the way we envisage the human learning brain.
This research has been supported by grants from the Spanish Government (Ministerio de Ciencia e Innovavión, MCIN) to A.R.F. (BSO2002-01211/SEJ2005-06067/PSIC and Ramon y Cajal program), predoctoral grant from (FPI) to A.M.M. and postdoctoral grant to R.D.B. We would like to acknowledge the contributions and helpful comments of the following colleagues during the realization of the word-learning experiments presented in this review: A.C. Bachoud-Levi, E. Camara, L. Fuentemilla, T. Gomila, M. Laine, D. Lopez-Barroso, J. Marco-Pallares, T.F. Münte and J.M. Toro.
One contribution of 11 to a Theme Issue ‘Word learning and lexical development across the lifespan’.
↵1 Krashen (1982) distinguished two independent ways of developing language abilities, i.e. acquisition and learning. Language acquisition is envisioned as a long-term process by which infants and adults store information in the brain unconsciously, being suitable for oral as well as for written language. This acquisition process for second LL would rely in similar mechanisms as the ones involved in first language acquisition. A typical example of this process might be the amount of knowledge that a learner might be able to pick up, for example, when an adult visits a foreign country and is exposed for 1 month to a new language. In contrast, LL is considered as a conscious process of knowledge development, a process that most of the time is supervised and susceptible to error corrections. An example of this type of learning would be the classic and supervised learning of grammar rules of a new language. With regards to second LL, an interesting idea dealing with cognitive control processes is also presented by Krashen (1982), which states that the knowledge of a language might act as an editor process (self-monitor). Thus, in order to produce a sentence in another language, first the unconscious ‘acquired knowledge’ would come up, and afterwards, the editor (learned knowledge) would be used to correct (self-correct) this sentence. A noteworthy claim of this framework is that the ‘learned knowledge’ might only be useful as a self-monitoring process, a time-consuming process which needs to be self-triggered, but without directly determining the fluency of the speaker. In this regard, LL might never become equal to language acquisition.
↵2 Bates et al. (1998) presented an emergentist approach as a possible solution to explain the origins of language, grammar and, at last, as a solution to the nature–nurture controversy. In this approach, innateness is defined as the amount of information in a complex outcome that was contributed by the genes (considering its differences in expression and complex regulation in response to specific environmental signals). Following Elman et al. (1996), three levels of innateness are defined, ordered based on the amount of information that must be contributed by the genes in each level (from strong to weak): (i) representational constraints: the innate structuring of the neural representations that constitute ‘knowledge’, which refers to synaptic connectivity at the cortical level; (ii) architectural constraints: the innate structuring of the information-processing system that must acquire and/or sustain these representations, which is specified as the quantity of units, layers, and types of connectivity between units, etc.; and (iii) temporal constraints: the innate constraints on the timing of developmental events, which refers to the number of cell divisions that take place in neurogenesis, the spatio-temporal waves of synaptic growth and pruning, and the relative differences in timing between subsystems (vision, audition, etc.). Bates et al. propose that only the last two levels (architectural and temporal constraints), which require much less genetically specified information, might be sufficient to provide an emergent solution to the nature–nurture controversy. Considering this approach, language would have emerged from the interaction between the architectural constraints placed by genes (nature) and the specific situations that an organism encounters in the world (nurture).
↵4 The topography of the difference waveforms between meaningful and meaningless new words at the third sentence showed a right parieto-central distribution which might fit with the standard interpretation that we are observing a modulation of the N400 component (Kutas & Federmeier 2000). However, it is noticeable in the figure that this difference comprises a larger time window (from 200 until 700 ms) and could also be related to a superimposed long-lasting positivity for meaningless new words or long-lasting negativity for the meaningful new words. The increased positivity for the new word meaningless condition could indicate a larger effort devoted to unravel the meaning of the new word reflected in the P3 family like components (which is in agreement with the long reading times in the self-pace reading experiment; see Otten et al. 2007). In a similar vein, long-lasting frontal-central negativities might be expected in this situation, considering that in the new word conditions participants required larger working memory demands compared to the real-word condition in order to assemble the different selected semantic aspects needed to be able to derive the meaning of the new work. Long-lasting negativities have been previously observed to increase working memory demands required for syntactic reanalysis (see Munte et al. 1998).
↵5 There is some controversy about the involvement of the arcuate fasciculus and the superior longitudinal fasciculus in the connection between the posterior STG and the premotor and nearby inferior frontal regions (mostly dorsal and opercular subregions). Schmahmann et al. (2007), using diffusion spectrum imaging (DSI) in post-mortem monkeys stated that the most possible connection between these regions might be supported by the fibres crossing the extreme capsule, in between the claustrum and the insula. This connection might be complemented by the middle longitudinal fasciculus, which would wire the inferior parietal lobe (near the angular gyrus) to the auditory association areas (STG) and multimodal upper bank of the superior temporal sulcus. The authors suggest that both systems in monkeys might correspond to a possible precursor of a language comprehension system. This proposal diverges radically from classical views in which the communication between these regions is mediated by the arcuate and superior-longitudinal fasciculus (Catani et al. 2005; Saur et al. 2008). These inconsistencies between DTI and DSI data in humans and monkeys could be clarified in the future using comparative DTI–DSI studies between different species. For example, in a recent and interesting study, Rilling et al. (2008) encountered that the projection of one of the branches of the arcuate fasciculus to the middle and inferior temporal lobe parts, which is clearly present in humans (see Catani et al. 2005), was not evident in non-human primates. Rilling and colleagues proposed that the larger increase in white-matter volume in the frontal and temporal lobes might be related to the evolution of language processing and the transmission of word-meaning information in humans when compared to non-human primates (Rilling & Seligman 2002; Glasser & Rilling 2008).
↵6 A very similar idea in discourse processing has been proposed by Graesser et al. (1994), which is called the search-after-meaning principle. It states that comprehenders will always attempt (effortfully) to construct meaning out of text, social interactions and perceptual input.
↵7 There is still disagreement about the crucial areas involved in semantic and conceptual processing and how semantic-conceptual information is stored and represented in the brain (see Thompson-Schill et al. 2006; Wise & Price 2006; Patterson et al. 2007). At least four different proposals exists for the involvement of the temporo-parietal cortex in partial storage of these representations: (i) the STG/STS bilaterally, with the posterior sites being more involved in form processing and anterior sites in meaning (Scott & Johnsrude 2003); (ii) the left pSTG (including the junction between left pSTG and the inferior parietal lobe) and left MTG (Mummery et al. 1999; Binder et al. 2000; Dronkers et al. 2004; Vigneau et al. 2006; Lindenberg & Scheef 2007); (iii) the involvement of left pMTG and ITG regions (Luders et al. 1991; Nobre et al. 1994; Hickok & Poeppel 2004); and (iv) the left anterior temporal pole, which is crucially affected in semantic dementia disorder (Patterson et al. 2007; Ferstl et al. 2008).
↵8 In a recent study, Bayley et al. (2008) evaluated two patients who had large MTL lesions, extending to the fusiform and insular cortices (the same patients were evaluated in Bayley & Squire 2005). In this new study and using different tasks, some residual evidences of new factual learning were detected. This result points out that in case of large MTL lesions, slow and gradual semantic learning is still possible using other neocortical regions, an idea in favour of previous computational proposals (McClelland et al. 1995).
- © 2009 The Royal Society