The goal of this paper is to widen the lens on language to include the manual modality. We look first at hearing children who are acquiring language from a spoken language model and find that even before they use speech to communicate, they use gesture. Moreover, those gestures precede, and predict, the acquisition of structures in speech. We look next at deaf children whose hearing losses prevent them from using the oral modality, and whose hearing parents have not presented them with a language model in the manual modality. These children fall back on the manual modality to communicate and use gestures, which take on many of the forms and functions of natural language. These homemade gesture systems constitute the first step in the emergence of manual sign systems that are shared within deaf communities and are full-fledged languages. We end by widening the lens on sign language to include gesture and find that signers not only gesture, but they also use gesture in learning contexts just as speakers do. These findings suggest that what is key in gesture's ability to predict learning is its ability to add a second representational format to communication, rather than a second modality. Gesture can thus be language, assuming linguistic forms and functions, when other vehicles are not available; but when speech or sign is possible, gesture works along with language, providing an additional representational format that can promote learning.
Children around the globe learn to speak with surprising ease. But they are not just learning to speak—they are also learning how to use their hands as they speak. They are learning to gesture. We know that in adult speakers, gesture forms a single system with speech and is an integral part of the communicative act [1,2]. In this paper, my goal is to widen the lens on language learning to include the manual modality—to include gesture. I begin by examining the language-learning trajectory when it is viewed with this wider lens. The central finding is that children display skills earlier in development than they do when the focus is only on speech. They can, for example, express sentence-like ideas in a gesture–speech combination several months before they express these ideas in speech alone. Gesture thus provides insight into the earliest steps a language-learner takes and might even play a role in getting the learner to take those steps.
I then consider what happens if a child does not have access to the oral modality and has only the manual modality to use in communication. Deaf children who are exposed to input from a language in the manual modality, that is, an established sign language like American Sign Language (ASL), learn that language as naturally as hearing children exposed to input from a language in the oral modality [3,4]. But 90% of deaf children are born to hearing parents , who typically do not know a sign language and want their deaf child to learn a spoken language. Even when given intensive instruction in the oral modality, children with severe to profound hearing losses typically are not able to make use of the spoken language that surrounds them [6,7]. If, in addition, they do not have access to a sign language, the children are likely to turn to gesture to communicate. Under these circumstances, the manual modality steps in and gesture assumes the roles typically played by the oral modality—it takes over the forms and functions of language and becomes a system of homesigns that display many of the characteristics found in established sign languages. Gesture can thus become language under the right circumstances, although it grows into a fully complex linguistic system only with the support of a community that can pass the system along to the next generation. By observing the steps a manual communication system takes as it become a fully elaborated sign language, we can gain insight into the factors that have shaped human language.
I end by asking what it might mean to widen the lens on language in the manual modality, that is, to look at gesture produced along with sign language. Signers do gesture when they sign . Like the gestures that accompany speech, the gestures that accompany sign are analogue in form, and thus complement the discrete, segmented categories found in sign (and speech). But, unlike gesture and speech, gesture and sign are produced in the same (manual) modality. The question I ask is whether the gestures that accompany sign play the same role in learning as the gestures that accompany speech. Addressing this question allows us to determine whether gesture's importance in learning stems from the fact that it is produced in a different modality from speech or from the fact that it represents information in a qualitatively different format from speech.
What I hope to show is that widening the lens on language to include the manual modality gives us deeper insight into the nature and the time course of language learning and learning in general, and can also give us insight into the relationship between language and cognition. In addition, by investigating how the manual modality can be used to create language (as in homesign and emergent sign languages), we open a window onto language that lets us identify properties of language that are so ‘resilient’ that they can be developed even by a single user versus properties of language that are more fragile, and that require a community of users, and perhaps generations of learners, to emerge.
2. Widening the lens on spoken language learning to include the manual modality
(a) The gestures that accompany speech selectively predict linguistic milestones
I begin by examining the gestures that hearing children produce in the process of learning spoken language. At a time in development when children are limited in the words they know and use, gesture offers a way for them to extend their communicative range. Children typically begin to gesture between eight and 12 months [9,10], first producing deictic gestures (pointing at objects, people and places in the immediate environment, or holding up objects to draw attention to them), and later at around 26 months  producing iconic gestures that capture aspects of the objects, action or attributes they represent (e.g. flapping arms to refer to a bird or to flying ). The fact that gesture allows children to communicate meanings that they do not yet express in speech opens up the possibility that gesturing itself facilitates language learning. If so, changes in gesture should not only predate, but they should also predict, changes in language. And they do, both for words and for sentences (table 1).
(i) Nouns and verbs
The more a child gestures early on, the more words are likely to be found in the child's vocabulary later in development [13–16]. Even more compelling, we can predict which particular nouns will enter a child's verbal vocabulary by looking at the objects that the child indicated using deictic gestures several months earlier . For example, a child who does not know the word ‘cat’, but communicates about cats by pointing at them is likely to learn the word ‘cat’ within three months . Gesture paves the way for children's early nouns.
However, gesture does not appear to pave the way for early verbs—although we might have expected iconic gestures that depict actions to precede, and predict, the onset of verbs, they do not. Özçalıışkan et al.  observed spontaneous speech and gestures in 40 English-learning children from ages 14 to 34 months and found that the children produced their first iconic gestures for actions six months later than their first verbs. The onset of iconic gestures conveying action meanings thus followed, rather than preceded, children's first verbs.1 But iconic gestures did increase in frequency at the same time that verbs did and, at that time, children used these action gestures to convey specific verb meanings that they were not yet expressing in speech. Children thus do use gesture to expand their repertoire of verb meanings, but only after they have begun to acquire the verb system underlying their language.
Even though they treat gestures like words in some respects, children learning a spoken language very rarely combine their gestures with other gestures, and if they do, the phase tends to be short-lived . But children do often combine their gestures with words, and they produce these gesture + speech combinations well before they produce word + word combinations. Children's earliest gesture + speech combinations contain gestures that convey information that complements the information conveyed in speech; for example, pointing at a ball while saying ‘ball’ [20–24]. Soon after, children begin to produce combinations in which gesture conveys information that is different from and supplements the information conveyed in the accompanying speech; for example, pointing at a ball while saying ‘here’ to request that the ball be moved to a particular spot [19,22,25–28].
As in the acquisition of words, we find that changes in gesture (in this case, changes in the relationship gesture holds to the speech it accompanies) predict changes in language (the onset of sentences). The age at which children first produce supplementary gesture + speech combinations (e.g. point at box + ‘open’, or give gesture + ‘bottle’) reliably predicts the age at which they first produce two-word sentence-like utterances (i.e. sentences containing a verb, e.g. ‘open box’, ‘give bottle’) [17,29,30]. The age at which children first produce complementary gesture + speech combinations (e.g. point at box + ‘box’) does not. Moreover, supplementary combinations selectively relate to the syntactic complexity of children's later sentences. Rowe & Goldin-Meadow  observed 52 children from families reflecting the demographic range of Chicago and found that the number of supplementary gesture + speech combinations the children produced at 18 months reliably predicted the complexity of their sentences (as measured by the Index of Productive Syntax ) at 42 months, but the number of different meanings they conveyed in gesture (where point at dog and point at bottle are counted as conveying different meanings) at 18 months did not. Conversely, the number of different meanings children conveyed in gesture at 18 months reliably predicted their spoken vocabulary (as measured by the PPVT ) at 42 months, but the number of supplementary gesture + speech combinations they produced at 18 months did not. Gesture is thus not merely an early index of global communicative skill, but is a harbinger of specific linguistic steps that children will soon take—early gesture words predict later spoken vocabulary, and early gesture sentences predict later spoken syntax.
Gesture does more than open the door to sentence construction—the particular gesture + speech combinations children produce predict the onset of corresponding linguistic milestones. Özçalışkan & Goldin-Meadow  observed 40 of the children in the Rowe & Goldin-Meadow  sample at 14, 18 and 22 months and found that the types of supplementary combinations the children produced changed over time and, critically, presaged changes in their speech. For example, the children began producing ‘two-verb’ complex sentences in gesture + speech combinations (‘I like it’ + eat gesture) several months before they produced complex sentences entirely in speech (‘help me find it’). Supplementary gesture + speech combinations thus continue to provide stepping-stones to increasingly complex linguistic constructions.
Gesture does not, however, always predict transitions in language learning. Gesture precedes and predicts linguistic developments when those developments involve new constructions, but not when the developments involve fleshing out existing constructions. For example, Özçalışkan & Goldin-Meadow  found that the 40 children in their sample produced combinations in which one modality conveyed a predicate and the other conveyed an argument (e.g. wash gesture + ‘hair’ = predicate in gesture + object in speech) several months before they produced predicate + argument combinations entirely in speech (e.g. ‘popped this balloon’ = predicate + object, both in speech). However, once the basic predicate + argument construction had been acquired in speech, the children did not rely on gesture to add arguments to the construction. Thus, the children produced their first predicate + 2 argument combinations in speech (e.g. ‘I want the Lego’ = agent + predicate + object, all in speech) and in gesture + speech (point at father + ‘have food’ = agent in gesture + predicate in speech + object in speech) at the same age .
(iii) Complex nominal constituents
As mentioned earlier, the age at which children first produce complementary gesture + speech combinations in which gesture indicates the object labelled in speech (e.g. point at cup + ‘cup’) does not reliably predict the onset of two-word sentence-like utterances , reinforcing the point that it is the specific way in which gesture is combined with speech, rather than the ability to combine gesture with speech per se, which signals the onset of future linguistic achievements. The gesture in a complementary gesture + speech combination has traditionally been considered redundant with the speech it accompanies but, gesture typically locates the object being labelled and, in this sense, has a different function from speech . Complementary gesture + speech combinations have, in fact, recently been found to predict the onset of a linguistic milestone—but they predict the onset of complex nominal constituents rather than the onset of sentential constructions.
If children are using nouns to classify the objects they label (as recent evidence suggests infants do when hearing spoken nouns ), then producing a complementary point with a noun could serve to specify an instance of that category. In this sense, a pointing gesture could be functioning like a determiner. Cartmill et al.  analysed all of the utterances containing nouns produced by 18 children in Rowe & Goldin-Meadow's  sample and focused on (i) utterances containing an unmodified noun combined with a complementary pointing gesture (e.g. point at cup + ‘cup’) and (ii) utterances containing a noun modified by a determiner (e.g. ‘the/a/that cup’). They found that the age at which children first produced complementary point + noun combinations selectively predicted the age at which the children first produced determiner + noun combinations.2 Not only did complementary point + noun combinations precede and predict the onset of determiner + noun combinations in speech, but these point + noun combinations also decreased in number once children gained productive control over determiner + noun combinations. When children point to and label an object simultaneously, they appear to be on the cusp of developing an understanding of nouns as a modifiable unit of speech, a complex nominal constituent.
Gesture has also been found to predict changes in narrative structure later in development. Demir et al.  asked 38 children in the Rowe & Goldin-Meadow's  sample to retell a cartoon at age 5 and then again at ages 6, 7 and 8. Even at age 8, the children showed no evidence of being able to frame their narratives from a character's perspective in speech. Taking a character's first-person perspective on events has been found, in adults, to be important for creating a coherent narrative representation . Interestingly, many of the children, even at age 5, did take a character's viewpoint into account in their gestures. For example, to describe a woodpecker's actions, one child moved her upper body and head back and forth, thus assuming the perspective of the bird (as opposed to moving a beak-shaped hand back and forth and taking the perspective of someone looking at the bird, a skill that appears later in development ). Moreover, the children who produced character-viewpoint gestures at age 5 were more likely than children who did not produce these gestures to go on to tell well-structured stories (as measured by the narrative structure coding system developed by Stein & Glenn ) in the later years, even controlling for early syntactic skills and initial level of narrative structure. Children were thus able to use gesture to take on a character's perspective before being able to do so in speech, and those early gestures signalled upcoming developments in their spoken narrative production. Gesture thus continues to act as a harbinger of change as it assumes new roles in relation to discourse and narrative structure.
(b) The mechanisms underlying gesture's role in language learning
We have seen that early gesture predicts subsequent developments in speech across a range of linguistic constructions (table 1). Interestingly, gesture plays this role not only for children who are learning language at a typical pace, but also for those who are experiencing delays. Children with unilateral brain injury whose spoken language is delayed also display delays in gesture. Child gesture thus has the potential to serve as an early diagnostic tool, identifying which children will exhibit subsequent language delays, and which will catch up and fall within the normative range [42,43].
Why does early gesture selectively predict later spoken vocabulary size and sentence complexity? At the least, gesture reflects two separate abilities (word-learning and sentence making) on which later linguistic abilities can be built. Expressing many different meanings in gesture during development is a sign that the child is going to be a good vocabulary learner, and expressing many different types of gesture + speech combinations is a sign that the child is going to be a good sentence learner. The early gestures children produce thus reflect their cognitive potential for learning particular aspects of language. But early gesture could be doing more—it could be helping children realize their potential. In other words, the act of expressing meanings in gesture could be playing an active role in helping children become better vocabulary learners, and the act of expressing sentence-like meanings in gesture + speech combinations could be playing an active role in helping children become better sentence learners. The next sections explore this possibility.
(i) Gesture provides opportunities to practice conveying meanings
Child gesture could have an impact on language learning in at least two ways. First, gesture gives children an opportunity to practice producing particular meanings by hand at a time when those meanings are difficult to express by mouth. We know, for example, that early gesture use is related to later vocabulary size. In a mediation analysis, Rowe & Goldin-Meadow  found that the relatively large vocabularies children from high SES families display at 54 months can be partially explained by child gesture use at 14 months. In turn, child gesture use at 14 months can be explained by parent gesture use at 14 months, even when parent speech is controlled. Importantly, parent gesture does not appear to have a direct effect on subsequent child spoken vocabulary—the effect is mediated through child gesture, suggesting that it is the act of gesturing on the part of the child that is critical.
Although these findings suggest that child gesture is playing a causal role in language learning, we need to manipulate gesture to be certain of this claim. Previous work has found that telling 9- and 10-year-old children to gesture when explaining how they solved a math problem does, in fact, make them particularly receptive to subsequent instruction on that problem—the gesturing itself appears to be responsible for their improved performance after instruction . As another example more relevant to language learning, LeBarton et al.  studied 15 toddlers (beginning at 17 months) in an eight-week at-home intervention study (six weekly training sessions plus follow-up two weeks later) in which all children were exposed to object words, but only some were told to point at the named objects. Before each training session and at follow-up, children interacted naturally with their parents to establish a baseline against which changes in communication were measured. Children who were told to gesture increased the number of gesture meanings they conveyed not only when interacting with the experimenter during training, but also when later interacting with their parents. Critically, these experimentally induced increases in gesture led to larger spoken repertoires at follow-up. The findings suggest that gesturing can play an active role in word-learning, perhaps because gesturing to a target picture in the context of labelling focuses children's attention to objects in the environment, to the labels, or to the object–label relation [46,47]. Children's active engagement in the bidirectional labelling context when told to gesture may also draw their attention to gesture's communicative function, which could also have beneficial consequences for vocabulary development [48–51].
Although we know that encouraging children to point at objects enhances word-learning, there have been no studies to date encouraging children to produce supplementary gesture + speech combinations. We thus know only that early supplementary gesture + speech combinations reflect the child's readiness to produce two-word utterances. More work is needed to determine whether these combinations play an active role in bringing about the onset of two-word utterances.
(ii) Gesture elicits timely speech from listeners
The second way in which child gesture could play a role in language learning is more indirect—child gesture could elicit timely speech from listeners (e.g. ). Because gesture seems to reflect a child's readiness for acquiring a particular linguistic structure, it has the potential to alert listeners (parents, teachers and clinicians) to the fact that a child is ready to learn that word or sentence. Listeners who pay attention to those gestures and can ‘read’ them, might then adjust their talk, providing just the right input to help the child learn the word or sentence. Consider a child who does not yet know the word ‘rabbit’ and refers to the animal by pointing at it. His obliging mother responds, ‘yes, that's a rabbit’, thus supplying him with just the word he is looking for. Or consider a child who points at her mother while saying the word ‘hat’. Her mother replies, ‘that's mommy's hat’, thus translating the child's gesture + word combination into a simple sentence.
Just as mothers are sensitive to whether their children are familiar with the words they present, adjusting their strategies to make the word comprehensible (e.g. linking the new word to related words, offering terms that contrast with it directly, situating it by appealing to past experiences ), mothers are sensitive to their children's gestures [25,54]. Mothers translate into their own words not only the single gestures that children produce (e.g. ‘that's a bird’, produced in response to the child's point at a bird), but also the gestures that children produce in combination with words conveying different information, that is, supplementary gesture + speech combinations (‘the bird's taking a nap’, produced in response to the child's point at bird + ‘nap’) . Interestingly, mothers produce longer (and potentially more syntactically complex) sentences in response to their children's supplementary gesture + speech combinations (point at bird + ‘nap’) than to their complementary gesture + speech combinations (point at bird + ‘bird’). Moreover, mothers' sentences tend to be longest when they pick up on information conveyed in child speech and gesture (e.g. ‘the bird's taking a nap’), despite the fact that they could easily have produced sentences that are just as long when they pick up on information conveyed only in the child's speech (‘It's time for your nap’) or only in the child's gesture (‘It's just like grandma's bird’) or when they ignore the child's utterance entirely (‘Let's read another book’) .
If child gesture is playing an instrumental role in language learning, mothers' translations ought to be related to later word- and sentence-learning in their children—and they are . In terms of word-learning, when mothers translate the gestures that their children produce into words, those words are more likely to quickly become part of the child's vocabulary than words for gestures that mothers do not translate. In terms of sentence-learning, children whose mothers frequently translate their child's gestures into speech tend to be first to produce two-word utterances. The age at which children produce their first two-word utterance is highly correlated with the proportion of times mothers translate their child's gestures into speech, suggesting that mothers' targeted responses to their children's gestures might be playing a role in helping the children take their first steps into multiword combinations. Because they are finely tuned to a child's current state (cf. Vygotsky's zone of proximal development ), adult responses of this sort could be particularly effective in teaching children how an idea is expressed in the language they are learning.
3. When the manual modality is all that the language-learner has
As described earlier, children whose hearing losses prevent them from acquiring spoken language and whose hearing parents have not exposed them to sign language turn to gesture to communicate. These gestures, called homesigns, display many of the properties found in the early communication systems that hearing children learn from their spoken language models and that deaf children learn from the signed language models .
(a) The linguistic milestones found in homesign
(i) Nouns and verbs
Homesigners use pointing gestures to refer to the objects, people and places in their immediate surroundings. These gestures function like demonstratives (this, that) and can stand in for nouns in the children's gesture sentences, for example, point at jar—twist gesture = that (jar) twist. The demonstrative pointing gesture can be used to refer to any entity that is present, and homesigners use their pointing gestures to refer to the full range of entities that young hearing children refer to with their words, e.g. people, inanimate objects, body parts and places .
Homesigners use two additional devices to refer to entities. They produce pointing gestures that refer not to the specific object at the end of the point, but rather to the class of objects that the indexed object belongs to. For example, a homesigner points at the bubble jar, which is already open, and produces an iconic twist gesture; he wants his mother to open the bubble jar that she is holding, but he uses the (open) jar that is near him to indicate the kind of object he wants opened. These gestures are called category points, and homesigners typically begin producing them later in development than demonstrative points .
In addition to category points, homesigners also produce iconic gestures (gestures that represent an aspect of an action or object through pantomime) that function like nouns . For example, a homesigner moved two fists as though steering a car to describe a picture of a motionless car. When functioning as a noun, these gestures evoke a class of objects, rather than a specific object (unless, of course, they are accompanied by a pointing gesture). Homesigners tend to produce their first iconic noun gestures during the same observation session in which they first produce category points .
Homesigners also use their iconic gestures as verbs and adjectives . For example, a homesigner might use the two-fisted steer gesture to describe a scene in which a car is being driven, or to ask that a toy animal drive a car; this gesture is functioning like a verb. As another example, the child forms a round circle with his fingers to describe the shape of a penny; this gesture is functioning like an adjective.
The little that is known about the steps homesigners follow in developing nouns and verbs comes from a case study of an American homesigner, David [59,60]. David used all three devices (demonstrative pointing gestures, category pointing gestures and noun iconic gestures) to refer to entities, and also used iconic gestures as verbs and adjectives, during his first observation session at 2;10 (years;months).
The interesting developmental story is that, at all moments during this developmental period, David maintained a distinction between his nouns and verbs, but used different devices to do so over time. During the earliest period beginning at 2;10, David predominantly used demonstrative pointing gestures to refer to entities, but also used a few iconic noun gestures and category pointing gestures. Interestingly, his iconic noun gestures, which were potentially confusable with his iconic verb gestures, were distinguished in two ways: (i) David used different stems for his noun and verb iconic gestures; for example, if he used the twist stem (C handshape + rotate motion) in a verb context to refer to twisting open a jar, he did not use the twist stem in a noun context to refer to the jar itself . In other words, David had no noun–verb pairs containing the same handshape + motion stem. In this way, David resembled English-learning children whose first uses of words that can serve as both nouns and verbs were restricted to only one use; for example, the child would use ‘comb’ as either a verb (‘I comb hair’) or a noun (‘gimme comb’), but not both . (ii) In the noun and verb iconic gestures that David did produce early in development, he used handshape to distinguish the two types of gestures: he used handling handshapes in gestures used as verbs (i.e. the handshape represented a hand as it holds an object, e.g. two fists held as though beating a drum to refer to beating), but object handshapes in gestures used as nouns (i.e. the handshape represented features of the object itself, e.g. extending a flat palm to refer to an oar) .
Between ages 3;3 and 3;5, David stopped distinguishing between nouns and verbs in these particular ways; that is, he no longer had a restriction on using the same handshape + motion stem in a noun–verb pair (e.g. he could now use the twist stem to mean both twist and jar) , and he no longer restricted handling handshapes to verbs and object handshapes to nouns (e.g. he might use the two-fist handshape in a gesture referring to a drum, and the flat-palm handshape in a gesture referring to rowing) .3 But he developed new ways of distinguishing between nouns and verbs—he tended to abbreviate his noun gestures (e.g. he produced the drumming movement fewer times when the gesture served as a noun than when it served as a verb), and he tended to inflect his verb gestures (e.g. he displaced the drumming movement towards a drum, the patient, more often when the gesture served as a verb than when it served as a noun) . The way in which the homesigner makes the noun–verb distinction thus appears to vary as a function of the complexity of his homesign system.
Homesigners combine gestures into strings and those gesture strings display many of the properties found in the early sentences produced by hearing children learning spoken language and deaf children learning sign languages—semantically their sentences convey the same types of propositions, and syntactically their sentences are structured at both underlying and surface levels . In this sense, homesigners' gesture sentences warrant the label ‘sentence’.
Homesigners produce four types of action propositions in their gesture sentences (the proposition was determined using gesture form and context; see [57,63] for details): transitive acts with a recipient or endpoint (I give cookie to you), transitive acts without a recipient (I close box), intransitive acts with a recipient or endpoint (I go outside) and intransitive acts without a recipient (I dance); and six types of attribute propositions: nominal predicates (this is a ball), descriptor relations (ball is small), location relations (toaster is located in kitchen), possessive relations (toy trains belong to me), similarity relations (cup 1 resembles cup 2) and picture identification relations (picture of car resembles toy car).
The action proposition sentences homesigners produce are characterized by an underlying predicate structure. They produce sentences with a predicate and three arguments (e.g. give—point at self to mean you-give-me-apple), sentences with a predicate and two arguments (e.g. point at apple—eat, to mean you-eat-apple, or point at experimenter—move to mean you-move-here) and sentences with a predicate and one argument (e.g. point at dad—sleep to mean dad-sleep). Evidence for these underlying structures comes from the fact that the likelihood of producing a gesture for a particular argument depended on the underlying structure of the sentence (e.g. children were more likely to produce a gesture for apple when it was part of a three-element predicate frame, you-eat-apple, than when it was part of a four-element predicate frame, you-give-me-apple, simply because there was less competition among the underlying elements in a three-element frame than in a four-element frame . Interestingly, although the children, at times, produced gestures for all of the elements in a predicate frame, this was quite rare. In other words, the children rarely fleshed out their predicate frames.
One additional point in relation to underlying structure is worth making—it is the underlying predicate frame that determines when a gesture for a particular argument (the actor, for example) appears in surface structure, not how easy it is to guess the actor from context. If predictability in context were the key, first-person actors (the child him or herself) and second-person actors (the communication partner) should be omitted regardless of underlying predicate frame because their identity can be easily guessed in context (both persons are on the scene); and third-person actors should be gestured quite often regardless of underlying predicate frame because they are less easily guessed from context. However, Goldin-Meadow [64, p. 237] found that the systematic decrease in actor production probability as the number of potential arguments in underlying structure increases (from 1-argument to 2-argument to 3-argument) holds for first-person, for second-person and for third-person actors when each is analysed separately. The predicate frame underlying a sentence is thus an essential factor in determining how often the actor (and other semantic elements, e.g. the patient) will be gestured in that sentence.
In terms of the surface structure of the homesigners' gesture sentences, the elements that are explicitly gestured follow consistent patterns of two types : (i) production probability patterns. Children are likely to produce gestures for particular arguments in a predicate frame; for example, they are more likely to produce a gesture for the patient, drum, than for the agent, drummer, in a sentence conveying the 3-element transitive predicate, beat. (ii) Gesture order patterns. Children have preferred positions in which they place gestures for particular arguments; for example, they tend to produce gestures for patients before gestures for actions, e.g. point at drum—beat.
Although homesigners do not typically flesh out their predicates with gestures for additional arguments, they do elaborate their sentences by adding a second clause, that is, by constructing complex sentences containing two or more propositions . They typically produce clauses that are coordinately conjoined (e.g. point at jar—give—shake to ask the experimenter to give him a jar so that he can shake it), but they can also produce a second clause that is subordinate to the main clause (e.g. flutter—fall—point at boots—point at skates—glide, a comment indicating that we wear boots and skate when snow flutters and falls; the when clause is subordinate to the main clause). Importantly, the two-clause complex sentences homesigners produce have also been shown to have underlying predicate frames, providing evidence for an overarching sentence node [65,66].
(iii) Complex nominal constituents
A second way in which homesigners elaborate their sentences is to add complexity within a constituent, in particular, within the nominal constituent. As mentioned earlier, homesigners refer to entities by producing a demonstrative pointing gesture (point at bird = that) or an iconic noun gesture (flap palms at sides, bird = bird). At times, however, the children combine demonstrative pointing gestures with iconic noun gestures (e.g. point at bird—bird—pedal, to describe a bird who is pedalling a bicycle) to construct a complex nominal constituent, [[that bird] pedals]. These combinations function semantically and syntactically like complex nominal constituents in conventional languages, and also function as a unit in terms of sentence length (i.e. sentences containing complex nominal constituents were longer than sentences the child would have been expected to produce based on norms derived from the child's gesture sentences without complex nominal constituents .
Interestingly, homesigners tend to elaborate their sentences first by adding a second clause (i.e. producing coordinate sentences) and later by embedding information within a constituent (i.e. producing complex nominal constituents), whereas children learning conventional spoken language show the opposite pattern [68,69], even when they are learning spoken languages that allow a great deal of noun omissions .
Homesigners are able to use their gestures to recount stories, and those gestured stories are of the same types, and of the same structure, as those told by hearing children within their cultures—they tell stories about positive events (e.g. emotional gain), negative events (e.g. physical harm) and routine events. In a study of narratives produced by four Chinese and four American homesigners, Phillips, Goldin-Meadow and Miller  found that all eight children produced at least one gesture narrative, but varied greatly in the total number of narratives they produced. Despite the variability in frequency of narration, the homesigners displayed very similar structural patterns in their narratives. All eight children elaborated upon the basic narrative, including setting information and voluntary actions in their stories. Some children in each cultural group went further and produced narratives containing a complication and temporal order as well. Moreover, the two children who produced enough narratives to discern a developmental pattern (one Chinese and one American homesigner) advanced their narrative skill by adding one feature at a time in a manner consistent with descriptions of the developmental patterns seen in hearing children [71,72].
The narratives experienced by children who are exposed to a language model are saturated with cultural meanings; they provide cues about how to interpret experience, about what is valued, about what counts as a narratable event [73–76]. Unable to hear the verbal narratives that surround them, homesigners do not have full access to the socializing messages narratives provide. Nonetheless, their narratives bear echoes of culture-specific meaning. For example, Chinese homesigners use evaluative comments in their narratives more often than American homesigners, thus mirroring the cultural patterns found in Chinese and American hearing children learning to tell stories from a spoken language model . Homesigners can thus produce culturally appropriate narrations despite their lack of a verbal language model, suggesting that these particular cultural messages are accessible through non-verbal channels and are thus so important that they are not entrusted to a single medium.
(b) Homesign is the first step towards an established sign language
We have seen that homesigning children have gesture systems that contain many of the basic properties found in all natural languages. But child homesign is not a full-blown language, and for good reason. The children are inventing their gesture systems on their own without a community of communication partners. Indeed, when homesign children were brought together after the first school for the deaf was opened in Nicaragua in the late 1970s, their gesture systems began to cohere into a recognized and shared language. That language, Nicaraguan Sign Language (NSL), became increasingly complex, particularly after a new generation of deaf children learned the system as a native language .
The circumstances in Nicaragua permit us to go beyond uncovering skills children bring to language learning to gain insight into where those skills fall short; that is, to discover which properties of language are so fragile that they cannot be developed by a child lacking access to a conventional language model . By comparing current day child homesigners in Nicaragua with groups whose circumstances have allowed them to go beyond child homesign, we can begin to develop hypotheses about which properties of language are fragile, and which conditions foster the development of these relatively fragile properties (hypotheses that will then need to be tested using other approaches, e.g. artificial language learning studies). We begin by observing changes made to the system when it remains the homesigner's sole means of communication into adulthood [79,80]. Studying adult homesigners allows us to explore the impact that cognitive and social maturity have on linguistic structure. We can also observe changes made to the system when it becomes a community-wide language as homesigners come together for the first time [81,82]. Studying the signers who originated NSL allows us to explore the impact that a community in which signers not only produce, but also receive their communication, has on linguistic structure. Finally, we can observe changes made to the system when it is passed through subsequent generations of learners [83,84]. Studying generations of NSL signers allows us to explore the impact that passing a newly birthed language through new learners has on linguistic structure. In addition, as a backdrop, we can study the gestures that hearing speakers produce, with speech  and without it [80,86], to better understand the raw materials out of which these newly emerging linguistic systems have risen.
The sign language that is evolving in Nicaragua gives us the opportunity to watch language as it grows. For example, Goldin-Meadow et al.  charted the development of handshape use in nouns versus verbs in three Nicaraguan groups: (i) adult homesigners who were not part of the deaf community and used their own homesigns to communicate; (ii) NSL cohort 1 signers who fashioned the first stages of NSL and (iii) NSL cohort 2 signers who learned NSL from cohort 1. In addition, they compared handshapes produced by these three groups with those produced by (iv) native signers of ASL, an established sign language. They focused on handshapes in classifier verbs, which are part of a productive classifier system in ASL and thus ought to vary across agent (e.g. someone moves a pen) versus no-agent (the pen moves on its own) contexts, unlike the nouns in their study, which were frozen lexical items. They found that all of the groups, including homesigners, used the same handshape form in both an agent and a no-agent context more often when labelling the object (e.g. the noun for pen) than when describing the event (e.g. the verb for moving); that is, there was less variability across contexts in noun handshape forms than in verb handshape forms. Importantly, the variability found in verbs was systematic, as it ought to be if the verbs are functioning like classifier predicates—all groups used object handshapes when describing no-agent events, but used both handling and object handshapes when describing agent events. In contrast to these grammatical properties, which are already present in homesign, stability in noun forms does not appear to be a linguistic property that an individual will necessarily develop without pressure from a peer linguistic community—individual homesigners used a number of different handshapes to label a particular object, whereas NSL and ASL signers tended to use only one.
The manual modality can thus take on linguistic properties, even in the hands of a young child not yet exposed to a conventional language model. But it grows into a full-blown language only with the support of a community that can transmit the system to the next generation. Examining the steps a manual communication system takes as it moves towards becoming a fully-fledged sign language offers a unique window onto factors that have made human language what it is.
4. Does gesture contribute another modality to learning or another representational format?
Signers of established sign languages like ASL gesture when they sign , but their gestures are produced in the same modality as their signs. Their utterances therefore do not constitute a multi-modality expression. However, those utterances do contain more than one representational format—an analogue format underlying gesture, and a discrete segmented format underlying sign [88–90], comparable to the analogue format that underlies the gestures that accompany speech, and the discrete format that underlies the speech itself [1,2]. In this section, we take another look at gesture's role in learning  and ask whether what is key about gesture in this role is that it adds a second modality (i.e. it adds the manual to the oral4) or that it adds a second representational format (i.e. it adds the analogue to the discrete). To address this question, we turn to children who are native signers who studied in a learning context. As there are currently no relevant studies examining language learning in young signers, we turn to learning in a different domain—mathematical equivalence—and in older children—9- to 10-year-olds.
There is evidence that gesture plays the same role, at least in some respects, in older math-learners as it does in younger word-learners. As described earlier, when examined in relation to speech, gesture can predict early linguistic milestones in hearing children learning a spoken language; for example, a child who produces an utterance containing a gesture that conveys different information from the information conveyed in the speech (‘mama’ + point at bottle, to indicate that mom is preparing a bottle) is on the cusp of producing two-word utterances (‘mama bottle’) and is likely to do so within three months . We find the same effect in older children asked to solve a math task. Consider, for example a child asked to solve the problem 4 + 2 + 6 = __ + 6. The child puts 12 in the blank and has thus (incorrectly) solved the problem using an add-to-equal-sign strategy. When asked to explain her solution, she says, ‘I added the 4, the 2 and the 6’, while pointing at the 4, the 2, the 6 on the left side of the equation and the 6 on the right side of the equation, and then says, ‘to get 12’, while pointing at the 12 in the blank—she has conveyed an add-to-equal-sign strategy in speech (4 + 2 + 6) but an add-all-numbers strategy in gesture (4 + 2 + 6 + 6). The child has thus conveyed different information in her gestures from what she conveyed in her speech, a gesture–speech mismatch. Importantly, children who produce many gesture–speech mismatches on the math task are likely to learn how to solve the problem after a math lesson—more likely than children who do not produce gesture–speech mismatches on the problem [92,93]. This effect has also been found in children learning to solve conservation problems , in children learning to solve balance problems  and in adults learning to solve stereoisomer problems in chemistry .
Gesture–speech mismatches juxtapose two different ideas within a single response. Is juxtaposing different ideas across two modalities essential for gesture–speech mismatch to predict increased learning? If so, then mismatch between sign and gesture (i.e. mismatch within one modality) should not predict learning in signers, unlike mismatch between speech and gesture (i.e. mismatch across two modalities), which does predict learning in speakers. Alternatively, it may be the representational formats within which different ideas are conveyed that are responsible for mismatch predicting learning. If so, juxtaposing different ideas across two distinct representational formats regardless of modality should be key, and mismatching gesture should predict learning in signers as well as speakers.
Goldin-Meadow et al.  explored this question in 40 ASL-signing deaf children and found, first, that the child signers produced gestures along with their signed explanations as often as hearing children produced gestures along with their spoken explanations on these problems. Moreover, the signers produced gesture–sign mismatches as often as the hearing children produced gesture–speech mismatches. For example, on the problem 5 + 9 + 2 = _ + 2, one signer produced the (incorrect) ‘add-to-equal-sign’ strategy in sign (fourteen, add, two, answer, sixteen, i.e. the child indicated that the three numbers on the left side of the equation should be added (14, which is the sum of 5 + 9, plus 2) and the sum, 16, put in the blank). At the same time, she produced a gesture indicating the two unique numbers on the left side of the equation (5 + 9) and no other numbers, thus conveying the (correct) ‘grouping’ strategy (i.e. group and add 5 and 9) in gesture. As another example, on the problem 7 + 4 + 2 = 7 + _, another signer produced the (incorrect) ‘add-to-equal-sign’ strategy in sign (add7+4+2, put13, i.e. an add sign produced over the 7, 4 and 2 on the left side of the equation, and a put sign produced over the 13 in the blank). At the same time, she produced gestures conveying the (correct) ‘add-subtract’ strategy (indexing gestures at the 7, the 4 and the 2 on the left side of the equation, combined with a take-away gesture over the 7 on the right side of the equation, i.e. add up all of the numbers on the left side of the problem and subtract the number on the right).
Even more important from the point of view of our discussion here, the more gesture–sign mismatches signers produced in their problem-solving explanations prior to instruction, the more likely they were to profit from the lesson and solve the problems successfully after instruction . It thus appears to be gesture's ability to introduce a second representational format that is key to its success in predicting learning—mismatch can predict learning whether the categorical information is conveyed in the manual (sign) or oral (speech) modality. However, these findings leave open the possibility that the analogue information must be conveyed in the manual modality. The manual modality may be privileged when it comes to expressing emergent or mimetic ideas, perhaps because our hands are an important vehicle for discovering properties of the world [98–100].
As a final caveat, it is important to point out that the gestures the deaf and hearing children produce along with the math explanations are different from the kinds of gestures that children produce in the early stages of language learning. The learning task facing the pre-linguistic child is language itself. When gesture is used in these early stages, it is used as an assist into the linguistic system, substituting for words that the child has not yet acquired. But once the basics of language have been mastered, children are free to use gesture for other purposes—in particular, to help them grapple with new ideas in other cognitive domains, ideas that are often not easily translated into a single lexical item. As a result, although gesture conveys ideas that do not fit neatly into speech throughout development, we might expect to see a transition in the kinds of ideas that gesture conveys as children become proficient language users. Initially, children use gesture as a substitute for the words they cannot yet express. Later, once they master language and other learning tasks present themselves, they use gesture to express more global ideas that do not fit neatly into word-like units .
Widening the lens on language to include the manual modality has given us a deeper understanding of language learning and learning in general. Hearing children who are acquiring spoken language use gesture along with speech to communicate, and those gesture + word combinations precede, and predict, the acquisition of word + word combinations conveying the same notions. These findings make it clear that children have an understanding of these notions before they are able to express them in speech, thus eliminating one frequently held explanation for the slow acquisition of certain structures—the cognitive explanation, that is, that children do not express a given structure because they lack an understanding of the notion underlying the structure. Widening our lens to include the manual modality thus allows insight into when cognition does, and does not, shape the course of language learning.
We have also seen that when a child is prevented from using the oral modality, that child can fall back on the manual modality, creating gestures that assume many of the forms and functions of language. These homemade gesture systems constitute the first step in the emergence of a manual sign system, which can, under the right circumstances, become a fully-fledged language. Widening the lens to include the manual modality thus allows insight into the skills children themselves bring to language learning (insight that is difficult to come by when we look only at children acquiring spoken language under typical learning conditions), and into the factors that can lead to the emergence of fully complex, conventional linguistic systems.
Finally, when we widen the lens on conventional sign languages to include gesture, we can address a question that cannot be examined looking solely at spoken language. The representational formats displayed across two modalities in speakers—the categorical in the oral modality (speech) and the analogue in the manual modality (gesture)—appear in one modality in signers—the categorical (sign) and the analogue (gesture), both in the manual modality. Nevertheless, we find that gesturing in signers predicts learning just as gesturing in speakers does, suggesting that what matters for learning is the presence of two representational formats (analogue and categorical), rather than two modalities (manual and oral).
What is ultimately striking about children is that they are able to use resources from either the manual or the oral modality to communicate in distinctively human ways. When other vehicles are not available, the manual modality can assume linguistic forms and functions and be language. But when either speech or sign is available, the manual modality becomes part of language, providing an additional representational format that helps promote learning. As researchers, we too need to use resources from both the manual and the oral modalities to fully understand language, learning and cognition.
Preparation of this chapter was supported in part by grant no. R01 DC00491 from NIDCD, grant nos. R01 HD47450 and P01 HD40605 from NICHD and grant no. SBE 0541957 from NSF to the Spatial Intelligence and Learning Center (the author is a co-PI).
One contribution of 12 to a Theme Issue ‘Language as a multimodal phenomenon: implications for language learning, processing and evolution’.
↵1 Estigarribia & Clark  have found that pointing gestures attract and maintain attention in talk differently from iconic (or, in their terms, demonstrating) gestures, which may account for the fact that pointing gestures predict the onset of nouns, but iconic gestures do not predict the onset of verbs.
↵2 This selectivity can be seen in the fact that the onset of complementary point + noun combinations (point at box + ‘box’) predicted the onset of determiner + noun combinations (‘the box’) in these 18 children, but not the onset of two-word combinations containing a verb (e.g. ‘open box’), which is predicted by the onset of supplementary gesture + speech combinations (point at box + ‘open’) .
↵3 Interestingly, David treated iconic gestures serving two different noun functions in precisely the same way with respect to handshape—he used object handshapes in iconic gestures that serve a nominal predicate function (e.g. that's a bird), as well as in iconic gestures that serve a nominal argument function (e.g. that bird pedals). This pattern suggests that noun is an overarching category in David's homesign system—an important finding in itself.
↵4 Although facial gestures have less potential for transparency than manual gestures, they may also offer a second window onto a speaker's thoughts and thus predict learning.
- © 2014 The Author(s) Published by the Royal Society. All rights reserved.