The movements we make with our hands both reflect our mental processes and help to shape them. Our actions and gestures can affect our mental representations of actions and objects. In this paper, we explore the relationship between action, gesture and thought in both humans and non-human primates and discuss its role in the evolution of language. Human gesture (specifically representational gesture) may provide a unique link between action and mental representation. It is kinaesthetically close to action and is, at the same time, symbolic. Non-human primates use gesture frequently to communicate, and do so flexibly. However, their gestures mainly resemble incomplete actions and lack the representational elements that characterize much of human gesture. Differences in the mirror neuron system provide a potential explanation for non-human primates' lack of representational gestures; the monkey mirror system does not respond to representational gestures, while the human system does. In humans, gesture grounds mental representation in action, but there is no evidence for this link in other primates. We argue that gesture played an important role in the transition to symbolic thought and language in human evolution, following a cognitive leap that allowed gesture to incorporate representational elements.
A growing body of evidence suggests that movements of the body not only reflect processes of the mind but can also influence them. We focus here on one particular type of movement—representational gesture. These gestures have the potential to provide a link between action and thought because gesture offers a vehicle not only for representing information about action, but also for representing that information outside of the context of real-world acts. Representational gestures are hand movements that often resemble the actual movements involved in acting on objects (e.g. rotating the hand in the air as though twisting a jar lid). However, gestures represent rather than replicate actions. Unlike actions, gestures do not bring about physical change in the environment (the twisting motion does not actually open the lid). They can, however, change how we (and others) think and speak, and may have played a central role in developing the human ability to think and speak.
In this paper, we review and discuss the relationship between action, gesture and mental representation in humans, and assess the comparative evidence for a link between action, gesture and thought in non-human primates. We begin by reviewing studies of action's influence on thought, focusing on evidence that experience doing actions affects the mental representations of those actions. We then turn to a discussion of human gesture. After describing how and when gesture is used, we address gesture's relationship to action. Representational gestures can depict action in a number of different ways—for example, the hand can represent the hand of an agent performing an action on an object, the object itself or the trajectory of the motion. We end our discussion of human gesture by exploring gesture's influence on mental representation, and conclude that gesture can have a stronger influence on thought than action itself. In fact, it has a unique ability to act as a bridge between thought and action because it is both kinaesthetically close to action and yet also symbolic.
In the final section, we turn to action representation and gesture in non-human primates (specifically, monkeys and apes). We review evidence that non-human primates are adept at understanding and performing actions, but suggest that they are not able to represent actions symbolically in gesture. We focus on the naturalistic (i.e. not human-taught) gestures of great apes and compare them with human gestures. Although there are many differences between human and ape gestures (synchronization with vocalization, systematic patterning, social motivation to share information), it is the ability to represent action through gesture that seems to be unique to humans and key to the differences in non-vocal communication. Non-human primates can use gesture in complex ways, but their gestures are often abbreviated versions of actions, and are not representational or ‘symbolic’ in the way that many human gestures are. Moreover, unlike the human mirror neuron system, the monkey mirror system does not respond to manual representational gestures, suggesting that the mirror system may play an important role in distinguishing the way action and gesture are processed in humans versus non-human primates. We conclude that gesture does not serve as a bridge between action and cognition in non-human primates and that building this bridge may have been an important step in human evolution.
2. Action and thought
(a) The relation between action and thought
The mind and the body have historically been studied as separate entities, leading to the view that cognition and action are independent domains (see [1,2] for discussion). Recent theories of human cognition suggest that the mind is not an isolated system but rather is integrated into the body's sensorimotor systems, and that our representations of objects and events are linked to our experiences of acting on the world (e.g. [3–7]). This embodied approach to cognition places a heavy emphasis on the idea that our mental representations of objects, events and many types of information we encounter arise from (and are linked to) our physical experiences interacting with the world. For example, when asked to make preference judgements between non-sensical letter sequences, skilled typists preferred pairs of letters that are typed with different fingers on different hands (letters that would be physically easier to type if one were to type them) than letters typed with the same finger. Novice typists with little previous typing experience had no such preference . Importantly, neither group could explain the differences between the letter pairs, suggesting that the skilled typists' preference was unconsciously based on their previous motor experience of how easy or hard it was to type the presented letters. All else being equal, we generally prefer what is easiest to act on, perceive, read, etc. Our prior sensorimotor experiences are so tightly linked to our mental representations that they can influence our thinking about objects or events even in scenarios far removed from relevant actions (e.g. ).
Despite growing evidence that action influences thought, the process through which action interacts with representation is not well understood. Some propose that neural representations of objects and events are built upon neural activations that arose during past experience interacting with objects and events in the world (e.g. ). Under this view of embodied cognition, mental representations of objects and events reflect, and to some extent rely on, traces of neural activation (or ‘perceptual symbols’) caused by past real-world interactions (for a review, see ). Others propose that the physical limitations of brain size require that areas primarily responsible for one type of ability be reused for a range of purposes and that these overlaps are primarily responsible for neural co-activation during physical and mental tasks . Under this view, concepts of particular objects need not be grounded in the actions a person has performed on that object, but rather in the exaptation of one area of the brain for use in another area. For example, finger sensitivity and mental arithmetic involve the same area of the sensorimotor cortex, and disruption of the shared area leads to both acalculia and finger agnosia . Under the neural reuse view, this co-activation might arise because the shared circuit is specialized for sequencing information or representing arrays, rather than because mental arithmetic is grounded in counting on one's fingers . The neural reuse theory does not deny that action can influence thought, but the theory stresses that not all thought is necessarily grounded in action. The disagreement is one of degree, not of kind.
(b) How does action experience affect action representation?
The theory of embodied cognition maintains that the processing or representation of particular actions relies on prior experience doing those actions. In this view, action representation grows out of action experience. The theory predicts that experience with an action should affect subsequent thought relating to that action. Research shows that experience performing an action can influence thought about that action in at least three different ways. It can affect (i) perception of the action, (ii) discrimination of the action, and (iii) comprehension of language related to the action.
Experience performing an action can change how that action is processed in the brain when it is observed (i.e. how the action is perceived). Studies using functional magnetic resonance imaging (fMRI) found that when experts in one dance style were shown a video of that style, areas of their brains involved in action observation and production showed greater activation than when they watched a video of an unfamiliar dance [11,12]. By testing male and female ballet dancers who perform different moves but are familiar with the moves of their partner, a follow-up study demonstrated that it was the dancers' experience of doing an action, rather than their experience of watching their partner perform an action, that accounted for the greater neural activation . These studies suggest that the neural systems involved in action production influence the neural systems involved in action perception; specifically, having previous experience performing an action is correlated with activation of sensorimotor brain regions when observing that action. Previous motor experience also influences memories of items or objects that we have encountered in the past and the degree to which we like the objects in question [8,14,15].
The kinetic experience of performing an action can help people identify that action even if they have not seen it performed. Casile & Giese  blindfolded participants and taught them to swing their arms as if walking using an atypical gait pattern (one with a phase difference of 270° between the left and right arms rather than the typical 180°). Participants who had the kinetic experience of performing the arm motions corresponding to the atypical gait were more successful at visually discriminating videos of unfamiliar gaits that had phase differences similar to the one they had experienced in training than were participants who had not received the training. However, they were not better at discriminating unfamiliar gaits with other phase differences. These results indicate that specific experience of performing an action improves the ability to visually recognize that particular action—even when the person has never seen the action before.
Previous experience of performing an action can also affect how language related to the action is understood and processed. Using fMRI, Beilock et al.  measured the comprehension and processing of language related either to ice-hockey movements or to everyday actions. Half of the participants had extensive experience playing ice hockey; the other half had none. The authors found that both groups showed similar comprehension and processing of language related to common actions. However, the group with hockey experience showed greater comprehension of language related to hockey moves than the group without hockey experience. Importantly, the relation between experience and comprehension was mediated by neural activation in the dorsal premotor cortex (believed to be responsible for the selection and planning of well-learned motor sequences [18,19]). The more hockey experience individuals had, the greater their level of activation in this area, and thus the greater their comprehension of hockey-related language. This finding demonstrates that when people hear language related to actions they have previously performed, brain regions involved in planning those actions are activated, which may help them process the language faster and interpret the meaning more accurately than individuals who have not had experience performing the actions. The findings also support previous studies that point to the importance of the left dorsal premotor cortex in auditory comprehension of language related to familiar actions [18–22].
Taken together, the studies outlined above provide support for the embodied cognition framework—namely that the internal representations used to perceive, discriminate and comprehend action and action-based language are associated with the sensorimotor system used to perform these actions . Greater experience performing a certain action strengthens the recruitment of the sensorimotor system in internal representations of information about this action—even in the absence of the overt action itself.
3. Gesture and thought
Gesture forms an integrated system with speech and contributes to the meaning listeners glean from speech [23–25]. For example, listeners are more likely to grasp the message conveyed in speech if it is accompanied by a gesture conveying the same message as speech than if it is accompanied by no gesture at all. Conversely, listeners are less likely to grasp the message conveyed in speech if it is accompanied by a gesture conveying a different message than if it is accompanied by no gesture at all [26,27]. But gesture goes beyond modulating the listener's comprehension of speech—it can convey information on its own. For example, listeners can extract information from gesture even if that information is not found anywhere in the accompanying speech [26,27]. Not surprisingly, since gesture forms an integrated system with the speech it accompanies, gestures produced in the context of speech are often difficult to interpret when presented in an experimental situation without speech .
There is considerable evidence that gesture plays a role for the speaker as well as for the listener—that it has cognitive as well as communicative functions. Speakers gesture when their listeners cannot see their gestures (e.g. on the phone or when speaking to a person behind a barrier over an intercom [28,29]). More strikingly, congenitally blind speakers (who have never seen anyone move their hands when they talk) gesture and do so even when addressing blind listeners . Findings such as these indicate that gesturing serves a function not only for listeners, but also for speakers themselves. Indeed, speakers are more fluent, producing fewer errors and verbal hesitations, when they are permitted to gesture than when they are prevented from gesturing [31,32]. Gesturing while speaking also frees up working memory: speakers find it easier to remember a list of unrelated items when they gesture while talking than when they do not gesture [33–35]. Gesturing also provides kinaesthetic and visual feedback that can directly aid problem-solving. People can use gesture to work through different solutions to a problem and gather information about the alternatives through the visual and motor feedback of their own gestures .
(a) What do human gestures look like?
Gestures take many forms. They can be performed with the hands, head or other parts of the body, direct attention towards or away from the speaker and have culturally shared forms or vary according to the speaker's representations. For example, deictic gestures draw attention to objects, people or locations in the environment (e.g. pointing at an object or holding it up for display). Conventional gestures (or ‘emblems’) use a standardized form to convey a culturally specific meaning (e.g. an upward movement of the head used to mean no in Turkey). Representational gestures capture aspects of an action, object or idea either iconically (e.g. moving two fists in the air as though beating a drum) or metaphorically (moving two open hands in the air as though weighing two sides of an argument). These representational gestures are generated on the spot rather than stored in a lexicon (as conventional gestures are), and convey information about a gesturer's thought process or mental representation of an event [25,37].
Representational gestures that depict actions or objects through an iconic mapping to real-world events may be performed from either a first- or third-person perspective. Gestures performed from a first-person perspective are referred to as character-viewpoint gestures . In these gestures, the gesturer assumes the role of the person performing the action and his hands represent the character's hands—for example, swinging a closed fist as if gripping the handle of a tennis racket as the gesturer describes a stroke he made when playing tennis. Gestures performed from a third-person perspective are referred to as observer-viewpoint gestures . In these gestures, the gesturer does not assume a role in the action but views it from the outside; his hands then represent participants and objects in the event—for example, tracing the path of a tennis ball as he describes hitting it over a fence.
Not all representational gestures depict aspects of real-world physical events. Representational gestures can also be used to represent abstract ideas. When they do, they are usually described as metaphoric because they map abstract ideas onto physical actions or features. The gestures themselves are not metaphoric; they convey physical features, movement or space. Rather, the metaphor is contained in the relation between gesture and speech, where speech communicates an abstract concept and gesture adds a physical element to the concept, often providing a link to an action that grounds the abstract language in physical experience. For example, a person might say, ‘we need to think about the future’ and extend a hand forward, thereby displaying a temporal metaphor in which the future is ahead of the speaker. In one common type of metaphoric gesture, the speaker gestures as if holding a solid object in one or more hands while talking about an abstract concept or idea. By gesturing as if holding onto the idea, the speaker indicates that she is treating the idea as a physical object, as though it were a thing that can be given from one person to another, lost, taken apart, etc. Metaphoric gestures can also convey abstract relations by emphasizing parts of the accompanying speech or surrounding physical environment. In one study of mathematical problem-solving, children indicated that the two sides of an equation should receive equal treatment by producing the same sweeping motion under each side of a mathematical equation during their explanations . Although the children's gestures did not convey traditional metaphors, they did highlight an abstract relation (the notion of equivalence) by gesturing to each side of the written equation in exactly the same way. Examples such as these demonstrate how gesture can ground even abstract ideas in physical actions.
(b) Gesture can represent action
Representational gestures are thought to be a type of simulated action (e.g. [25,39]). Recently, Hostetter & Alibali  proposed that these gestures result from a direct extension of mentally simulated action and perception. In their view, gesture arises when activation spreads from the areas involved in action planning to those involved in action execution. Character-viewpoint gestures provide support for the view that gesture is rooted in action simply because they resemble the kinematics of actions on objects in the real world.
In character-viewpoint gestures, the actions of the gesturer's hands closely mimic the movements she would make when performing the action in the real world. This similarity may be used, in certain circumstances, to enact familiar action sequences while reasoning or talking about real-world action. The proprioception of performing familiar movements may activate detailed mental representations of objects by simulating acting on the world . Streeck describes the gestures a car mechanic made while talking about problems with different cars. Because the mechanic frequently encountered the same types of problems, he had developed a set of ‘habitualized’ gestures he used when faced with familiar problems. These gestures had similar forms every time he used them (such as turning an invisible ignition key or shifting an imaginary car into second gear) and were closely based on the motor patterns he used when solving the problems in the real world. These types of routinized gestures lie somewhere between iconic representational gestures and conventional gestures because they use the same movement pattern every time.
Peoples' gestures tend to reflect their own experience. For example, Cook & Tanenhaus  found that the gestures people produced when talking about a particular task (the Tower of Hanoi puzzle) reflected their kinematic experience solving the problem. The Tower of Hanoi is a challenging task in which people are presented with an array of three pegs in a row and are asked to move a stack of discs of different sizes from one peg to another in a particular order so that larger discs are never placed upon smaller ones and only one disc is moved at a time . Cook & Tanenhaus  had adults solve the task and then asked them to explain how they had solved it. When people completed the task on an actual tower before describing how they solved it, they used many character-viewpoint gestures in their descriptions: cupping and moving their hands as if holding and transferring discs up and over the peg. When people solved the task on a computer, they produced fewer grasping handshapes during their descriptions and the trajectory of their gestures was more likely to reflect the horizontal path that the mouse followed (i.e. they moved horizontally from peg to peg rather than moving up and over each peg). Gestures representing actions on or by objects thus reflect the speaker's previous experience with those objects.
Gestures representing the use of objects in actions (as in tool use) are cognitively complex. They require that the gesturer represent not only the motion of the action (say hammering) but also the object involved in the action (the hammer). To depict the use of an object, a gesturer must either gesture as if holding an imaginary object, or use a body part (usually the hand or finger) to represent the object involved in the action. Imaginary object gestures are a type of ‘character-viewpoint’ gesture because the hands are representing the hands of the agent holding the object. Gestures in which a body part represents an object are ‘observer-viewpoint’ gestures because the gesturer does not act as an agent manipulating an object but instead depicts only the action of the object. Imaginary object gestures more closely resemble the actions made when acting on real-world tool objects (e.g. moving the hand shaped as though holding a toothbrush back and forth across the mouth when describing brushing one's teeth). However, they require that the gesturer have a strong mental representation of the tool object involved in the action because there is no physical placeholder standing in for the tool. In contrast, gestures in which a body part represents an object rely on physical substitutes for the object involved in the action (e.g. rubbing the index finger across one's teeth during a description of tooth brushing) and thus might require a less strong or detailed mental representation of the tool object than imaginary object gestures.
Gestures depicting tool use have not been studied in spontaneous conversation, but there is an extensive experimental literature on the types of manual representations people produce when asked to pantomime how tools are used. Adults pantomime these types of events as if holding an imaginary tool in their hand most of the time . But 3- and 4-year-old children frequently use body parts as stand-ins for the tool object rather than manipulating an invisible tool [44–46]. For example, they run their fingers through their hair when asked to portray a hair-combing act, rather than pretending to hold a comb and move it over their hair. One possibility is that the children are not using their hands to represent action, but are instead merely performing the act with their fingers (i.e. literally combing their hair with their fingers). The same phenomenon has been found in aphasics  and schizophrenics , individuals whose symbolic representation systems have been disrupted.
It is unclear whether pantomimes elicited to portray tool use are cognitively different from gestures spontaneously produced to communicate about tool use. Some have argued that tool-use pantomimes involve different neural substrates from those involved in producing communicative gestures (see review in ), a distinction supported by the fact that apraxic patients who have difficulty producing tool-use pantomimes have fewer (or no) problems producing conventional gestures or meaningless hand shapes [50,51]. In contrast, Frey , who finds no differences in activation during tool-use pantomimes and communicative gestures, argues that difficulty producing tool-use pantomime is due to the cognitive demands of representing absent objects.
Gestures in which hands represent hands (and act upon imaginary objects) intuitively seem less cognitively complex than those in which hands represent other things. However, it is clear from the research on tool-use pantomimes that manipulating imaginary objects in gesture is a non-trivial task and involves more than simply recreating the motor patterns performed during actions on the real world. Mental representation of non-present objects is difficult, and people with incomplete linguistic representation systems often rely on a body part to stand in for the absent object. These difficulties highlight the difference between performing a movement sequence as part of a real-world action and performing the same sequence as part of a representational gesture. The kinetic movements may be very similar, but using movement to represent action adds an additional level of complexity.
Gesturing from a first-person perspective (as in imaginary object gesture) may be complex not only because the gesturer needs to mentally represent an imaginary object, but also because the gesturer needs to take the perspective of the agent in the depicted event. Character-viewpoint gestures as a whole seem to involve a more sophisticated mental representation of events than observer-viewpoint gestures because of the need to take the agent's perspective into account. This perspective-taking ability is associated with narrative development in speech. Young children's ability to produce character-viewpoint gestures is associated with better concurrent narrative skills and predicts improvements in narrative skill in the future .
(c) Gesture can influence thought
A great deal of research has shown that the spontaneous gestures speakers produce provide a window onto their thoughts (see  for a review). But there is growing evidence that gesturing can go beyond reflecting thought and can play a role in changing thought. In order to demonstrate that gesturing is causally involved in thinking, we need to manipulate the gestures that speakers produce.
Broaders et al.  asked children to gesture while explaining their solution to a math problem and subsequently gave them instruction on the problems. Children who were asked to gesture before the lesson were more likely to benefit from the subsequent lesson than children who were not asked to gesture. Many of the children conveyed strategies in their gestures that they had not expressed before being asked to gesture. Being forced to gesture activated previously unexpressed concepts. In turn, this expanded repertoire led the children to profit from subsequent instruction.
Gesturing can convey cognitive benefits to the speaker even when speakers are told precisely how to move their hands. Ehrlich et al.  gave a mental-rotation task to two groups of children, each instructed to gesture in a different way. In the task, children were shown two unconnected shapes and were asked to choose from an array of images the shape the two separated pieces would make if they were moved together. The unconnected shapes needed to be moved horizontally or vertically or rotated to create the final shapes. During a mental-rotation lesson, one group was told to show the experimenter with their hands how they would move two pieces together. Children in this group produced both character-viewpoint gestures (e.g. rotating their hands as if moving the pieces) and observer-viewpoint gestures (e.g. tracing the trajectory that the pieces would take). The other group was told to point to the two pieces. The children who produced gestures (either character- or observer-viewpoint gestures) representing the movement of the pieces learned more from the mental-rotation lesson than did children who produced pointing gestures .
As another example, Goldin-Meadow et al.  taught children to gesture in a particular way during a lesson on mathematical equivalence. The gestures (in which a pair of numbers was grouped together by placing a ‘V’ handshape underneath them) conveyed a novel ‘grouping’ strategy that none of the children had used before. The children were then given a lesson on mathematical equivalence and were told to perform the gestures they had learned. Importantly, the new grouping strategy was never used by the teacher, in either gesture or speech. Children who had been told to gesture using the grouping strategy improved more from the lesson than children who were not told to gesture. Moreover, the children who improved were very likely to express the grouping strategy in speech on the post-test, even though they had never heard it expressed in speech during the lesson. Gestures can thus instil new ideas in learners—creating thought in addition to reflecting it.
4. Gesture as a bridge between action and thought
Both action and gesture can affect the mental representation of actions and objects, but gesture's ability to represent action offers a way to ground abstract ideas in concrete actions. Gestures that represent action are actions performed within an imagined world. When gestures simulate action on or by objects, the objects involved in the event must be represented mentally. Actions, on the other hand, are performed on the physical environment. The objects they act on are present and do not need to be represented mentally. Thus, when we perform actions on objects, we are able to offload some properties of the task onto the environment. However, when we use gesture to represent action on or by objects, we must rely on mental sensorimotor representations of the objects involved. This is particularly true for gestures in which the gesturer's hands manipulate imaginary objects. Gestures in which a body part is used to represent an object involved in an action are symbolically complicated because they use one thing (e.g. a hand) to stand for another thing (e.g. a toothbrush), but they also allow some cognitive offloading because the hand serves as a physical placeholder for the object.
Simulating an action on an imagined object in gesture seems to strengthen the link between the action and the mental representation of the object, and does so more than performing the action on the object in the physical world. Beilock & Goldin-Meadow  asked participants to solve the Tower of Hanoi puzzle with real discs, and then describe how they solved the puzzle to another person. The largest disc was on the bottom of the stack and needed two hands to lift; the smallest disc was on the top and could be lifted with one hand. Following their explanation, participants were divided into two groups and given the task again. One group solved the task with precisely the same discs (no-switch); the other group solved the task with discs whose weights had been reversed (switch)—now the smallest disc on the top was the heaviest and needed two hands to lift. Participants in the switch group who had gestured with one hand when describing moving the smallest (and lightest) disc found it harder to execute the task the second time than the first. Moreover, their performance on the reverse weight task was predicted by the number of one-handed gestures they made during their explanation of the first task—the more one-handed gestures they produced, the worse they did on the task when the weights were reversed and the smallest disc required two hands to lift. The gestures produced by participants in the no-switch group had no relation to their performance on the second task. These findings suggest that people who used one-handed gestures to represent moving the small disc represented the disc as light, even though weight was not a relevant factor in solving the task. Representing the small disc as light causes problems when solving the problem a second time in the switch group (where the small disc is no longer light), but not in the no-switch group (where the small disc is still the lightest). Importantly, when additional participants were given the same tasks but without the explanation phase in between, there were no differences between the no-switch and switch groups—that is, switching the weight of the discs had an effect on subsequent performance only when the participants gestured prior to the performance, and only when those gestures were incompatible with the performance. In a follow-up study, Goldin-Meadow & Beilock  found that gesturing about the task more strongly influenced mental representations of the actions involved in the task than performing the task again (i.e. than acting on the objects). These studies add weight to the claim that representing action in gesture embeds embodied information into mental representations of action. In fact, when the effects of gesturing about action and acting were pitted against one another, gesturing appeared to have a stronger effect on the mental representation of the action than performing the action itself had.
5. Action and gesture in non-human primates
Non-human primates (specifically simians, hereafter referred to as ‘primates’) are extremely adept at performing manual actions. Although not as dexterous as humans, primates are nonetheless able to execute a great number of manual tasks requiring fine motor control (e.g. extractive foraging, delicate grooming and tool use). They are also able to extract information (including how to accomplish certain tasks) from watching others perform actions. Moreover, some primate species, great apes in particular, use a wide range of gestures in communication. Their gestures are used flexibly and intentionally, and at least some communicate specific meanings (see review in ). However, the gestures that primates produce lack the representational elements of human gesture. Whereas many human gestures symbolize actions and objects, ape gestures primarily indicate the gesturer's future actions by performing an abbreviated part of the action that would, in its full version, fulfil his or her goal.
(a) What do primates know about actions?
Primates are able to recognize particular movements in themselves and to determine when their movements are the same as those of others. They can easily learn to perform new actions. However, they are more likely to focus on the goal and the primary method of an action than on the details of specific movements used to achieve the goal . This observation has led some (e.g. ) to consider primates ‘emulators’ and human children ‘imitators’, although meta-analysis across studies shows that primates are capable of both goal emulation and process imitation (see [59,61] for discussion). Even though primates tend to focus on obtaining desirable outcomes rather than on a specific means for achieving those outcomes, they are able to detect small details in movement. For example, some ape species can recognize when an experimenter is copying their movements exactly  and, like humans, apes that are being copied will sometimes try to trick the copying individual into performing bizarre actions or making a mistake. The ability to recognize and learn both the kinematics and goals of actions from others suggests that primates have a mental representation of what action needs to be performed on a particular object, and can form or modify that representation without acting on the object themselves. There is no evidence, however, that primates are able to actively manipulate their representations and rehearse their actions before they attempt an action.
When solving unfamiliar tasks, primates are able to modify their techniques and strategies in response to information acquired during trial-and-error learning, but there is little evidence that they reason through multiple solutions to a problem (so-called ‘mental rehearsal’) before undertaking any actions (; but see ). It is, of course, impossible to say exactly what is going on inside the minds of non-human primates during problem-solving, but they do not exhibit the external behaviours that are associated with mental rehearsal in humans, such as gesturing or practising actions out of their functional contexts.
Early studies with great apes (e.g. ) suggested that primates might, indeed, consider different possible outcomes when faced with difficult problems, but there has been no consistent evidence of primates either gesturing through or acting out different versions of their actions before they act. Kendon  notes that several chimpanzees tested by Köhler  on problem-solving tasks behaved as though they were ‘acting out the wished-for state of affairs in a situation that [they treated] as analogous to the actual one’ (, p. 210). In the examples Köhler and Kendon describe, chimpanzees were presented with a challenging task (stacking boxes, uncoiling a rope, lifting a cage) that they had to perform to gain access to a food item. Köhler describes several cases in which an ape, when confronted with a problem, performed non-functional actions or hand movements that were not directed towards solving the problem at hand. These actions or gestures were thought to be an indication of ‘working through’ the problem before attempting a solution. In one such case, a chimpanzee was presented with a room full of boxes and a banana suspended in the air out of reach. The chimpanzee moved one box underneath the fruit and then eyed the distance from the box to the banana. Then he retrieved a second box, ‘but, instead of placing it on top of the first, as might seem obvious, began to gesticulate with it in a strange, confused, and apparently quite inexplicable manner. He put it beside the first, then in the air diagonally above, and so forth’ (, pp. 46–47). Kendon argues that the aborted actions are ‘pre-enactments’ of different scenarios, and that the chimpanzee ‘embarks on a course of action with the second box, but each time foresees that its outcome will not suit his purposes, so he cuts off, changes course and tries again’ (, p. 210).
These examples are intriguing, but they were made as real-time observations during problem-solving tasks and there have been no comparable observations since that would allow us to conclude that apes do use action or gesture to plan their actions. The more common view is that apes do not rehearse actions. In fact, their lack of rehearsal has been used as evidence that primates are incapable of ‘mental time travel’ (i.e. imagining performing actions in the past or the future ). Primates, like many animals, do perform modified versions of actions (such as biting, fighting or courtship) during play interactions [68,69]. However, while behaviours ‘rehearsed’ during play may help young primates perfect adult behaviours and learn to negotiate social situations, they differ from the targeted mental rehearsal involved in thinking through different versions of an action during action planning.
(b) Primate gesture
It is difficult to directly compare reports of gestures in humans and primates because researchers working in the two areas define gestures differently and often address different questions. Researchers studying primates define gesture as including not only visual movements of the hand, face and body (visual gestures), but also movements that come into contact with other individuals (tactile gestures) and movements that produce audible sounds (audible gestures). Primate researchers are also more likely to focus on gestures that are directed towards other individuals and discount similar movements made when an animal is solitary. These decisions make it particularly hard to determine whether primates ever use gesture as a cognitive aid outside of communicative contexts. The communicative gestures primates produce can be directly compared with communicative gestures produced by humans, although the challenges of determining when a gesture is intentionally communicative and what the gesturer aims to communicate are more difficult when observing primates.
Like humans, primates frequently use facial, manual or whole-body signals in communication. But primate gestures differ considerably from human gestures, particularly when it comes to symbolic representation of the world. Many non-vocal primate signals appear to be involuntary responses to internal emotional states like fear or excitation . Involuntary signals can be effective in communicating the presence of recurring events or goals (e.g. signalling the presence of a predator or asking to mate), but they cannot be employed strategically and almost certainly do not provide a cognitive aid in the way that human gesture does. However, some types of primate gestures are used flexibly in communication. These gestures, observed predominantly in great apes, are often referred to as ‘intentional gestures’ [70,71].
All great apes gesture to communicate. A large cross-species comparative study of great ape gesture found more similarities than differences in the types and uses of gesture across species (; see also [72–75]). Each species had a comparable repertoire size of 20–30 visual and tactile gestures. Subsequent studies have recorded species repertoires of up to 100 gesture types (e.g. ), but these differences can largely be attributed to how narrowly each gesture type is defined in each study .
Great apes use gesture in purposeful and socially complex ways. Their choice of when and how to gesture, particularly their choice of the tactile versus visual modality (whether they touch or do not touch others), is sensitive to whether they can be seen by others (e.g. [71,76,78–80]). This finding is thought to be evidence that apes take the visual attention of others into account when choosing how or when to gesture. There is also evidence that apes gesture to achieve particular goals. They expect responses from others and often wait for a response if a recipient does not respond immediately (see results from several species in Call & Tomasello ). Moreover, at least some ape gestures have specific meanings, and apes often repeat, change or elaborate their gestures when a recipient responds in an undesired way . When a recipient fails to respond at all to an attempted gesture, apes will persist and elaborate their gestures according to whether or not the recipient can see them . There is also some evidence that apes (at least orangutans) tailor their communicative strategies to how successful their initial communicative attempt was, so that they repeat gestures more often when communication has been partially successful and use a wide range of gestures when communication has failed completely .
(c) Comparison to human gesture
Though apes display a sensitivity and flexibility in their gestures that indicate they can use gesture to communicate intentionally, their gestures are distinctly different from the gestures used by humans. Human and ape gestures differ in the degree to which they are combined in structured ways, whether they can communicate a wide range of meanings, and whether they represent or reference real-world events in the same way.
Apes can combine gestures with one another, and two apes can produce gestures in response to one another in a communicative exchange. However, there is no indication that these sequences of gestures are combined according to systematic patterns—either to attract attention before communicating a particular desire, or to communicate a more complex meaning than is possible using a single gesture. Apes' gesture combinations are typically either repetitions of the same gesture or different types of gestures with the same meaning [83,84].
Humans rarely combine the spontaneous gestures that they produce along with speech into gesture sequences . However, when humans gesture without vocalizing (which is the typical situation for apes, who rarely produce their gestures along with vocalizations), they not only routinely combine different manual gestures with one another, but they do so following a systematic pattern; in other words, they use devices characteristic of human language. The clearest example is the sign languages of deaf communities handed down from generation to generation (e.g. ). However, hearing individuals will also develop complex patterns of gestures when they interact in circumstances where speech is either impossible or inappropriate (e.g. workers exposed to high noise levels or people following religious conventions prohibiting speech), although these systems rarely achieve the complexity characteristic of sign languages. Kendon notes that ‘the more generalized [the] circumstances are, the more complex [the] systems become’ (, p. 292). Thus, systems restricted to a specific type of interaction—say, operating heavy machinery—do not face pressures to adopt greater complexity because they are not used frequently enough, and in enough different scenarios to require significant modification. When human gesture systems are used frequently in a variety of situations—as in the case of the sign languages of the Plains Indians and Australian Aborigines—they begin to take on the complexities of spoken language.
Strikingly, humans will combine gestures in language-like ways even when they have never been exposed to the structures of a conventional language (spoken or signed). For example, deaf children whose hearing losses prevent them from acquiring the spoken language that surrounds them, and whose hearing parents have not exposed them to a conventional sign language, invent gestures to communicate with the hearing individuals in their worlds. These gestures exhibit many of the properties found in human language, including a simple syntax based on gesture order [87–91]. As another example, when hearing adults with no knowledge of sign language are told to describe a series of events using only their hands, the sequences of gestures they produce tend to follow a systematic order [92,93]. Interestingly, all hearing adults tested in this way display the same order (subject–object–verb), an order that is found in half of the world's spoken languages, and they do so whether or not the order is predominant in their spoken language [94–97].
In contrast to human gestures, ape gestures are almost universally requests for a particular response from the recipient. Tomasello & Camaioni  used this observation to draw a sharp contrast between ape and human gestures, characterizing apes' gestures as imperative and children's gestures as declarative. Where humans (even infants) will gesture to draw attention to an object or to comment on an aspect of the world, apes gesture primarily to request others to interact or leave. Most of the gestures of one ape genus (orangutans) can be categorized into only six types of requests: affiliate/play, move away, share food/object, stop action, co-locomote and take food/object . Other ape species use gesture to communicate fairly similar meanings (e.g. ). So, whereas human gestures can communicate a potentially boundless number of meanings, primate gestures appear to be restricted to initiating, ending and moderating frequent kinds of social interactions.
Another striking difference between ape and human gesture is the lack of deictic and representational elements in apes. Humans use deictic gestures to draw attention to objects in the environment and representational gestures (i.e. character-viewpoint, observer-viewpoint and metaphoric gestures) to refer to objects or events. Great apes almost never use gestures deictically to draw attention to things in conspecific interactions, although deictic gestures are sometimes used by captive apes communicating with humans ([99,100]; for an example of deixis in the wild, see Pika & Mitani ). Most importantly, the gestures apes use in their natural communication systems, even when produced to communicate with humans, do not seem to have any of the representational elements found in human gestures. Many gestures appear to be an incipient action reduced from a full-blown action that evoked a particular response from a recipient in the past; a process called ontogenetic ritualization [71,98]. For example, a shoving action eventually becomes the gesture ‘nudge’ or ‘shoo’ as the recipient learns to predict the gesturer's behaviour from the start of the action and responds appropriately.
It seems likely that most ape gestures began as actions and were co-opted into communicative devices either during ontogeny (via ritualization) or over evolutionary time . Indeed, even the few ape gestures that have been reported to be iconic could have been ritualized from functional actions rather than representing actions in the same way that human gesture does. One commonly cited ‘iconic’ gesture involves an ape's gesturing to indicate the direction it would like another to move by brushing along the recipient's body or swinging an arm in the desired direction (e.g. [103–105]). It is possible that these gestures indicate the direction of desired movement through iconic representation of the action. But they also may be incipient actions or other movements ritualized into gestures from what were once effective pulling or guiding actions. If the latter is the case, then the similarity between the movement of the gesture and the desired action would be incidental rather than an iconic representation of action.
(d) Could gesture serve as a bridge to thought in primates?
It is clear that humans gesture not only to communicate but also to aid their own cognition. The fact that humans gesture to themselves (outside of communicative contexts or when they cannot be seen) has been taken as evidence of gesture's cognitive function. Unlike humans, naturally reared apes have not been found to gesture when alone or when they are behind a barrier. In fact, apes choose not to use manual gestures when they cannot be seen and instead switch to vocal signals or auditory gestures (e.g. [79,81]). One methodological difficulty in making this comparison between apes and humans is that most studies of ape gesture require that a manual movement be directed towards another individual in order for it to be counted as a gesture —in other words, if an ape were to produce a gesture-like movement in the absence of a partner, it would not meet one of the criteria for a gesture.
Although we cannot conclude that the gestures apes use have no effect on their cognition, it seems safe to say that their gestures do not contribute to building mental representations the way humans' gestures can. There is no evidence that apes use gesture in a truly representational way. Their action-like gestures ‘represent’ actions through learned association, not by design. Many, if not most, of the gestures apes use are ‘species typical’ and do not differ across individuals or groups [70,71,76]. Apes can use their gestures flexibly in response to social and communicative contexts (varying when and where they gesture and which gestures to use), but the underlying forms of their gestures are probably chosen from a pre-existing repertoire. This tendency to use the same gestural forms every time differs sharply from humans' use of representational gestures, in which the exact forms are spontaneously generated during communication; they are not emblems or lexical forms—they have no ‘wrong’ forms.
Primates are excellent observers of actions and signals. They can extract information about the world by learning relationships between the signals and subsequent actions of other individuals or between others' signals and events in the external environment . They can understand, interpret and predict actions of others, even when those actions occur out of view . They are able to learn complex novel actions through observing others , and this ability to socially learn manual actions probably contributes to ‘cultural traditions’ in food processing or manipulation of objects [109,110]. However, even though primates can process, learn from and replicate actions, there is no evidence that they can represent actions using gesture.
When placed in the right environment, apes can acquire symbolic communication systems, learning the associations between objects in the world and symbols representing those objects. If apes are taught human-designed communication systems (such as modified American Sign Language or computer-based symbols), their communication resembles, in some but not all respects, the communication of a 2-year-old child (e.g. [111–114]). Moreover, there is some indication that when apes are given access to a symbolic representational system, they can use the system for more than communication. For example, one of the most proficient ape signers, Washoe, used her signs appropriately when she was alone, signing ‘quiet’ when sneaking into a room or signing to her dolls . However, the vast majority of the communication that language-trained apes produce not only is directed towards another, but is also used to get that individual to do something (i.e. to make a request ).
We do not claim that primates are incapable of mentally representing actions or objects, but it is clear that they do not represent actions or objects in their gestures. Without representational gestures, primates cannot link action to mental representation in the same way humans do. It is noteworthy, however, that when primates are taught a symbolic communication system, they do at times exhibit behaviours—such as using communicative signals outside of communicative contexts—that suggest they may be able to use symbols to aid or complement cognition (see, for example, [117,118]). Language-trained apes provide an interesting comparison to both humans and non-language-trained apes because they demonstrate the level of abstract cognition apes can reach when reared in human-like conditions and highlight the importance of rearing environment in the development of cognitive and communicative abilities.
6. Gesture and mental representation in the evolution of language
The gestures primates use in their natural communication systems have only little in common with the types of human gestures we have discussed in this paper (deictic, conventional and representational). However, they do resemble human language more than primate vocalizations do. Unlike humans, primates cannot learn new vocalizations (their vocal repertoires are essentially fixed) and their vocalizations seem to be elicited by emotional states rather than employed intentionally to communicate particular goals . Primates have greater flexibility and control in manual communication than they do in vocal communication. They can easily learn new hand movements, and use gestures flexibly in response to the visual attention and reactions of others. In fact, this flexibility in primate gesture is often cited as support for the theory that human language originated as a gestural system.
Many have proposed that human linguistic structure first emerged in gesture and only later spread to vocalization (e.g. [66,120–123]). The prevalence of co-speech gesture in human language  and findings that gesture precedes and predicts children's development of spoken language [124,125] demonstrate that gesture is an integral part of modern human language and not something layered on top of an older verbal system. Representational gesture, in particular, has been suggested as having provided a means of communicating complex events before human ancestors developed the ability to use shared symbols . Indeed, some argue that representational gesture (or, rather, pantomime) was a critical stage in a progression from manual action to spoken language and propose the mirror neuron system as a neural foundation for this transition .
(a) Mirror neurons
The discovery of mirror neurons provides a possible device through which primates might identify similarities between their own movements and the movements others produce. Mirror neurons are visuomotor neurons found in area F5 (and other connected areas) of the primate premotor cortex (roughly analogous to Broca's area in humans). They are unusual in that they discharge both when a primate performs an action directed towards an object and when it watches another individual perform that same action [127–129]. These neurons provide a link between perceived and performed actions and are one possible mechanism through which observed action could become simulated action.
The majority of work on primate mirror neurons has been done on macaque monkeys using single-cell recording techniques. These studies have found several different types of mirror neurons distributed in different areas of the brain. Some neurons respond primarily to the goals of actions (e.g. picking up an object), whereas others respond to both the goals and specifics of the movements (e.g. picking up an object between two fingers) . Primate mirror neurons are primarily activated by the movements or goals of grasping, placing or manipulating actions, and most are specific to one of these actions (i.e. they are activated by only one type of action ). Importantly, however, most monkey mirror neurons respond only when these actions are directed towards physical objects; they do not recognize movements (such as gesture) that simulate goal-directed actions in the absence of those objects . Monkeys do not have to see the object to activate the mirror system, but they do have to ‘believe’ that the object is present. For example, if a monkey is first shown an object and the object is then hidden by a screen, the monkey's mirror neurons will fire when a grasping hand reaches for the now-hidden object (although the response will be smaller than when the object is visible ). If the monkey is shown the grasping hand reaching towards a screen without having first seen the object behind the screen, the monkey's mirror neurons will not fire . Thus, if a grasping movement is directed towards an empty space (as it would be during a representational gesture), primate mirror neurons will either not fire or produce only a weak signal [131,133].1 Interestingly, primate mirror neurons respond to the sounds made by manipulating specific objects (e.g. ripping a piece of paper ) though neither the action nor the object is visible. This strengthens the argument that is it the ‘belief’ that an object is present, rather than the physical presence of the object, that activates the mirror system.
Evidence for a mirror neuron system in humans comes primarily from imaging studies (such as fMRI) and techniques stimulating areas of the brain during behavioural tests (such as transcranial magnetic stimulation) [107,130]. The human mirror system appears to have many of the same properties as the monkey mirror system. It fires for specific motor patterns as well as the goals of motor acts . However, the monkey and human mirror neuron systems differ in at least one critical respect: the monkey mirror neuron system does not fire unless an object is present (or the monkey thinks the object is present); the human mirror neuron system does. The human mirror system responds to empty-handed gestures, that is, to movements made in the air, simulating actions made on an object but without having the object present (though the brain areas that respond to representational actions are not entirely the same as those that respond to object-directed actions [136,137]; see also Skipper et al. , who find activation of the human mirror system during processing of co-speech gestures). This neural response to simulated action in the absence of objects may provide the foundation for understanding gestures as representations of actions on or by objects. The important point from the point of view of our discussion here is that this type of neural response is found in humans but not in monkeys.2
(b) A cognitive leap?
Arbib  proposes that the ability afforded by the mirror neuron system to draw parallels between actions of the self and others paves the way for complex imitation and provides a foundation for the evolution of neural mechanisms supporting representation through pantomime. Pantomime, he argues, was a necessary precursor to protosign, which when combined with vocalizations, evolved into protolanguage in the human lineage. The ability to recognize and imitate the manual actions of others is undoubtedly necessary for a complex gestural communication system to emerge, but it remains unclear how, when and why human ancestors began using gesture to represent elements of the world around them.
In the wild, primates are never exposed to representational gestures. Though many primate species have a rich repertoire of facial, manual and bodily signals, primate gestures lack the representational elements characteristic of many human gestures. Although some primate gestures have predictable meanings, the meanings are not iconically represented (as many human representational gestures are) nor are they culturally variable (as human conventional gestures are). Moreover, although primates can learn new gestures, they do not seem to acquire them through cultural transmission. The majority of gestures used by particular primate species do not vary greatly among captive populations. Idiosyncratic gestures unique to individual animals are frequently observed, but they do not spread through populations to become group-specific gestures as you would expect if gestures were acquired via cultural transmission (e.g. [73–76]). For primate gestural systems to have developed over time to produce anything resembling pantomime or conventional gestures, primates would have had to develop the ability to add gestures to their communicative repertoires by observing others. There is no evidence that primates have this capacity.
Using manual gesture to simulate and represent actions and objects outside of the context of acting on real-world objects represents a cognitive leap in hominid evolution. It is possible that increased demand for accurate tool manufacture and use in human ancestors drove many cognitive developments, including neural lateralization, more complex mental representations and complex manual imitation. These changes could have altered the nature of the gestural communication system, allowing human ancestors to acquire new gestures through imitation and link them to representations of actions. It seems likely, however, that sweeping social changes would also have been necessary for human language to develop. Studies in which apes are taught human communication systems have demonstrated that apes can learn new gestures or symbols and use them referentially (although not combinatorially), but even when acquiring symbolic communication, apes do not develop the social and representational milestones, such as theory of mind and pretend play, that accompany language development in young children.
The extent to which rearing environment, linguistic development and cognitive development interact with one another is a topic of great interest in human research and is not at all understood in primates . A handful of studies have shown that apes raised in a human-like environment exhibit cognitive skills that apes reared in their natural social groups do not exhibit [140,141]. However, it is not clear which aspects of the rearing environment are most influential and whether the cognitive abilities of human-reared apes are truly different from those of naturally reared apes; apes could differ from humans in their external behaviours because they differ in their motivation to participate in certain types of activities, motivation provided, in part, by rearing the apes in human environments. Given that juvenile primates (especially humans) have an extremely long period of maturation and dependence , there is great potential for interaction between the rearing environment and neural, cognitive and communicative development.
We know that the role gesture plays in human language development is complicated. Gesture both responds to and influences linguistic and environmental variables. Parents' gesturing predicts children's gesturing, which, in turn, precedes and predicts child speech . Children's gesturing also alters the environment for the child by facilitating interaction with parents and thus enriching the child's linguistic input . Comparative developmental research is necessary to investigate and tease apart the respective contributions of environment, action, gesture and cognition to non-human primate communication systems. That said, representational gesture (and the cognitive advantages it brings) appears to be a uniquely human ability. It is unclear, however, which pieces of the puzzle are missing in extant primates. We do not know whether primates lack a neural substrate enabling complex mental rehearsal, the cognitive ability to connect gesture to mental representation or the social motivation to create a rearing environment that would foster the development of these abilities. We hope that future studies on primates will investigate the relationships among action, gesture and cognition during development. Such studies will not only help us understand how these variables influence one another in primates, but also shed light on the relationship among action, gesture and representation in human evolutionary history.
We thank J. G. Foster, K. Brown and M. Cartmill for their comments on the manuscript, and G. Rizzolatti for his helpful discussion. Work described in this paper was supported by NICHD grants P01 HD40605 and R01 HD47450 and NSF Award no. BCS-0925595 to S.G.M., NSF Award no. FIRE DRL-1042955 to S.B. and NSF Award No. SBE 0541957 to S.B. and S.G.M. for the Spatial Intelligence Learning Centre.
One contribution of 12 to a Theme Issue ‘From action to language: comparative perspectives on primate tool use, gesture, and the evolution of human language’.
↵1 The one exception is facial movements that are not directed towards objects (e.g. lip-smacking ). Monkey mirror neurons do respond to these acts even though an object is not involved. However, these facial movements, although communicative, are not representational in the way the human manual gestures we discuss here are.
↵2 It is important to note that all of the work on the primate mirror system has been done on monkeys, but most of the findings of complexity in gestural communication come from great apes. It is possible that the mirror system in great apes is more human-like than monkey-like; however, given the fact that ape gestures lack representational elements, it seems likely that their mirror systems are still significantly different from those of humans.
- This journal is © 2011 The Royal Society