Royal Society Publishing

From action to language: comparative perspectives on primate tool use, gesture and the evolution of human language

James Steele, Pier Francesco Ferrari, Leonardo Fogassi


The papers in this Special Issue examine tool use and manual gestures in primates as a window on the evolution of the human capacity for language. Neurophysiological research has supported the hypothesis of a close association between some aspects of human action organization and of language representation, in both phonology and semantics. Tool use provides an excellent experimental context to investigate analogies between action organization and linguistic syntax. Contributors report and contextualize experimental evidence from monkeys, great apes, humans and fossil hominins, and consider the nature and the extent of overlaps between the neural representations of tool use, manual gestures and linguistic processes.

1. Historical background to the present issue

It has been recognized since at least Darwin's day that the human hand may have evolved adaptively to facilitate the control of tools [1, p.138], and that the human vocal tract has evolved to facilitate the articulatory gestures of spoken language [1, p.138]. But how closely coupled were these adaptive trends in the hominin lineage? Scientists have frequently considered the possibility of common underlying organizing principles in the neurophysiology of (usually spoken) language and of manual praxis, focusing, for example, on the domain-general implications of primate encephalization [2], on parallel schedules of development across domains during human ontogeny [3], on similarities in hemispheric lateralization of function [4], or on the emergence of gestural communication as an evolutionary precursor of speech [58]. However, in earlier formulations, arguments for such a coupling were often complicated by clinical observations of dissociations between deficits in the linguistic and praxic domains, as well as by cases of divergent functional lateralization in healthy subjects; while to many linguists, the analogy between linguistic syntax and action organization has sometimes seemed too loosely defined to carry much interpretive weight.

The papers in this Special Issue examine tool use and manual gestures in primates as a window on the evolution of the human capacity for language. Two quite recent scientific developments make this an opportune moment to revisit this topic almost 20 years after Gibson and Ingold edited the ground-breaking Tools, Language and Cognition in Human Evolution (1993), which addressed a similar theme. The first is the now-widespread clinical and experimental use of methods, such as functional magnetic resonance imaging (fMRI), that were still in their infancy when earlier reviews such as Gibson & Ingold's [9] appeared. Such methods enable highly targeted hypothesis testing in both clinical and non-clinical settings, and can very usefully complement evidence obtained with longer established techniques. The second is the development of a novel, coherent and experimentally well-supported neurophysiological hypothesis of a common architecture for processing certain key aspects of manual actions and of language, namely the ‘mirror neuron system’ or ‘mirror system’ hypothesis [6,8,10,11]. Discovery of the mirror neuron system has shed considerable light on the functional properties of a fronto-parietal network of predominantly motor-related brain regions involved in action organization. Localized in their initial discovery to area F5 of the monkey's premotor cortex [12,13], a probable homologue of Broca's area in humans, and then found also in the inferior parietal cortex [14,15], mirror neurons fire during both action execution and action observation. This capacity has been considered the basis for action understanding. The presence of a parieto-frontal mirror system has then been demonstrated in humans [1619]. In particular, it has been shown that this system is strongly activated during imitation [20] and it has been suggested, on the basis of several studies, that it plays an important role in speech comprehension [21].

Historically, paleoneurological work on endocasts of fossil hominins has emphasized the expansion of cortical language areas, notably Broca's area, as a distinctive structural marker associated with the emergence of human language capacities in earlier hominins (Homo habilis [22]; Homo ergaster [23]; but cf. [24,25]). However, a recent tracer study of the Broca's area homologue in living non-human primates indicates similarities to humans in connectivity and network architecture that may have provided early hominins with pre-adaptations for language [26]. Functionally, mirror neurons in area F5 in monkeys are activated by manual grasping actions as well as by ingestive and communicative orofacial gestures [27,28]; observations in human subjects have meanwhile also shown that Broca's area contains motor representations of hand movements as well as of speech-related actions (cf. [29]). This evidence has suggested to some scientists that human speech and language could have evolved by co-opting neurophysiological mechanisms involved in the organization of manipulative and ingestive actions. Subsequent work has supported the hypothesis of a close association in humans between some aspects of action and of language representation, in both phonology and semantics [6,21,27,30]. Nevertheless, a significant theoretical problem remains for any hypothesis that would derive language evolutionarily from action organization: namely, whether or not the action system can provide a sufficiently close analogue to linguistic syntax [11,31,32].

The parts of the Special Issue related to manual gestures focus primarily on manual actions involving tools. Tool use provides an excellent experimental context in which to investigate the analogy with linguistic syntax, for several reasons. Tools extend the effector organ (the hand and arm), and in complex tool use (defined by Johnson-Frey [33] as tool use that ‘converts the movements of the hands into qualitatively different mechanical actions’), tools provide a greater range of possible operations than can be achieved with the innate reaching and grasping capability of the hand alone. This requires both semantic knowledge of individual tools' functions, and a generative set of rules for their effective use. Complex tool use typically also requires asymmetrical coordinated bimanual action (in which each hand plays a complementary role; [34]), which has been found to be the most reliable elicitor of population-level right-handedness in captive African apes [35]. Asymmetrical bimanual coordinated actions provide a context for hierarchical embedding, with the discrete but complementary actions of each hand needing to be described in a nested action syntax; while long sequences of such actions organized towards a larger goal also create long-range dependencies (where a preparatory action at one time step is meaningful only in relation with another action that is executed at a later time step). Finally, there is an extensive archaeological record of hominin tool manufacture and use, which can be examined in tandem with the fossil anatomical evidence of the evolving hominin brain, hands and vocal tract to assess theories of the coupled or decoupled evolutionary history of our human capacities in the two domains.

2. Current research themes as illustrated by contributors to this issue

Among the most significant recent discoveries concerning the neurophysiology of action organization in primates are the discoveries of mirror neurons (as noted above; cf. [12]), and of the learned incorporation of the tool into an extended representation of the effector organ in the body schema [36]. Both discoveries were made in macaque monkeys. Two papers in this Special Issue build on these discoveries, and extrapolate evolutionary and comparative insights from observations of monkeys' tool-use learning. Iriki & Taoka [37] propose that the abstract cognitive functions of the inferior parietal cortex in humans derive from an expansion of areas originally involved in computing sensorimotor transformations for reaching and grasping actions, and emphasize the evolutionary importance of cortical plasticity (and the learned incorporation of the tool into the body schema), as seen in the learning-induced changes in the cortical micro-architecture of monkeys trained in tool-using tasks. They develop a speculative hypothesis for the evolution of increasing cognitive abstraction in tool use, and suggest that the brain mechanisms that subserve tool use, located in the parietal cortex, may bridge the gap between gesture and language by exploiting the same principles of spatial information processing to realize novel mental functions that are detached from body constraints.

In a more specific experimental context, Macellini et al. [38] meanwhile demonstrate the ability of macaques both to learn functional tool properties and then generalize them to novel objects, and to generalize functional tool use to novel tasks. However, when investigating the possibility of tool-use learning by observation of a demonstrator, Macellini et al. also find that macaques do not appear to be able to translate the visual presentation of a novel tool-using action demonstrated by an experienced third party into the production of a corresponding motor action themselves, although some forms of facilitation of tool interaction are present. As a speculation, they conclude that the common sequential organization of tool actions and speech and the overlap of activation, for both functions, of ventral premotor cortex and Broca's area, suggest that a basic organization of the motor system for hand and mouth actions has been exploited for the emergence of new functions that rely on the same mechanisms.

Two papers report experimental evidence of action organization in tool use by captive chimpanzees that may also have some relevance to the evolution of language. It is sometimes suggested that there may be common mechanisms involved in vocal tract gestural units and in manual action units at the level of motor control. Calvin [39] suggested that aimed throwing of stone projectiles by hominins could have provided a preadaptation for speech motor control, because of the demands this action makes for precision in movement timing; Calvin also noted that skilful hammering (as in nut-cracking) requires similar patterns of brachiomanual coordination. In a case of social tool use (captive apes throwing faeces or wet chow at human visitors as they pass by the enclosure, with the projectile acting as a tool to elicit a desired reaction from the visitors), Hopkins et al. [40] report findings of associations in chimpanzees between aimed throwing ability and communicative ability. More interestingly, they examined whether specific brain structures could somehow relate to such behavioural skills. The findings have been quite surprising. In fact, they found a correlation between aimed throwing ability and white-to-grey matter ratios in the homologue of Broca's area and in the motor-hand area of the precentral gyrus (with the effects more pronounced in the hemisphere contralateral to the preferred throwing hand). The same workers have also found that in captive chimpanzees, aimed throwing ability is associated with greater size of the posterior cerebellum [41].

No study has yet been conducted on the extent of any brain morphological correlates of chimpanzee individual ability with a non-social tool in a nut-cracking task, which has been described as ‘probably the most demanding manipulatory technique yet known to be performed by wild chimpanzees’ [42, p.174]. However, Frey & Povinelli [43] found evidence that chimpanzees display anticipatory grip selection in a task involving a sequence of acts aimed to extract a piece of food using a tool (a dowel). Critical for the task was the type of grip used to grasp the object, as it revealed the capacity to anticipate the forthcoming task. In humans, this anticipatory ability is linked to activation of a network implicated in response selection including frontal and parietal regions as well as the bilateral cerebellum (which is likely to be involved in feed-forward predictions of the sensory consequences and motor costs of a motor action). They briefly note the possible analogy with the phenomenon of coarticulation in gestural phonology.

Stone tools provide the longest and best-preserved archaeological record of the evolution of tool use in hominins, and there have been numerous attempts to discern indirect evidence of the emergence of language in the stone tool record ([4,44,45]; but cf. [46]). However, a necessary first step is to gain a clearer understanding of the organization of actions that would have been required to produce and use the tools that archaeologists recover. Two papers report the use of experimental archaeological techniques (the replication of Paleolithic stone tools) to elucidate contrasts and similarities between stone tool use in different tasks. To assess what was distinctive about skilled tool use by early stone tool-making humans (when compared with present-day chimpanzees), Bril et al. [47] compare and contrast features of action production and task organization in nut-cracking by chimpanzee and human subjects and in conchoidal fracturing of stone by human subjects, finding that the stone knapping task (replicating early archaeologically attested Oldowan techniques of flake removal from a single-platform core) is much more complex. They suggest that understanding human brain evolution as it affected skilled action execution in the stone-flaking task requires us to focus not on particular cortical areas in isolation, but rather on the coordinated evolution of different components of cortico-cerebellar systems. In particular, the marked expansion of the frontal cortico-cerebellar system in chimpanzees and humans appears to be consistent with their increased social learning capacities, exemplified in their similar learning strategies of fine motor skills such as tool use.

Stout & Chaminade [48] meanwhile use functional brain imaging to contrast cortical aspects of action organization in a similar Oldowan stone-flaking task with that involved in production of a later Lower Palaeolithic tool type, the Late Acheulean handaxe. Whereas the Oldowan task activates cortical areas involved in visuo-motor grasp coordination (including anterior inferior parietal lobe and ventral premotor cortex), but not the inferior frontal gyrus (IFG), the Late Acheulean task also activates the dorsal right IFG (pars triangularis), an area associated with more abstract action representation and greater hierarchical task complexity and with possible involvement of lateralized visuospatial working memory. This seems to reflect the relatively complex goal hierarchy of the Late Acheulean task, which involves both a greater number of discrete sequential knapping events, and long-range dependencies between individual events in an extended sequence. Stout & Chaminade review also the current status of alternative ‘gestural’ and ‘technological’ hypotheses of language origins, drawing on current evidence of the neural bases of speech and tool use generally, and on recent studies of the neural correlates of actions based on Palaeolithic technology.

The brain activation patterns of human subjects replicating the stone tool technology of Neanderthals have not yet been studied experimentally. However, Ambrose [49] notes the appearance by about 300 000 yrs BP of composite tools such as spears with hafted stone points made by Neanderthals and suggests that their assembly rules may be analogous to linguistic grammars. In addition, Neanderthals also appear to have been predominantly right-handed [4], suggesting the presence of a human-like left cerebral lateralization of function. It is, therefore, interesting to ask whether or not this species was also capable of human-like speech. Barney et al. [50] attempt to estimate the potential of the Neanderthal vocal tract to produce human-like articulatory gestures, concluding that the principal contrast between this species and modern humans lies in the more pronounced facial flattening of modern human skull morphology and the associated reduced length of the front (oral) resonating cavity. They make some progress in the difficult task of reconstructing this extinct species' vocal tracts, although their results do not resolve the question of whether or not this contrast with modern human facial architecture would have compromised Neanderthals' speech potential.

As mentioned above, Stout & Chaminade explicitly draw an analogy between action organization in a complex task such as production of a Lower Palaeolithic stone handaxe, and linguistic syntax. At a more general level, Pastra & Aloimonos [51] develop a framework for analysing the grammar of naturally occurring actions, suggesting that the key features of a minimalist generative grammar (nested and tail recursion, and ‘merge’ and ‘move’ operations) must also characterize the generative grammar of action, particularly for actions involving tools (where a set of unimanual-sequential and bimanual-synchronous movements may be necessary to set up a framework for final action execution to achieve the goal). The examples they give of nested recursion all also involve asymmetric coordinated bimanual actions (e.g. ‘grasp with hand1 knife, pin with knife bread—grasp with hand2 fork, pin with fork cheese, lick with tongue cheese—bite with teeth bread’, where hand1 and hand2 are the two hands, and the underlined sequence is (Pastra & Aloimonos propose) an example of a recursively nested action structure. Pastra & Aloimonos's action grammar may bring us closer to understanding the commonalities between action organization in tool-using tasks, and linguistic syntax (for additional recent discussions from alternative perspectives see Glenberg & Gallese [52] and Tettamanti & Moro [53]).

Communicative manual gestures have often been invoked as an evolutionary bridge between instrumental actions and syntactically ordered human vocal communication [5,7], although Stout & Chaminade [48] suggest that parsing of complex manual tool-use sequences during social imitation might have provided such a bridge for earlier hominins without the need to invoke a separate communicative gestural stage. Liebal & Call [54] note that the difficulty of categorically differentiating actions from communicative manual gestures in great apes may relate to the fact that many gestures are derived from non-communicative actions through phylogenetic or ontogenetic ritualization. Social learning represents a third mechanism whereby gestures can emerge out of actions. Liebal & Call suggest that given such a continuum between action and gesture, the heuristic classification of movements as communicative gestures requires the presence of some or all of the following features: motoric ineffectiveness, waiting for a response, gaze alternation and persistence. They cite a recent summary of a systematic comparison of gestures in apes and macaques by Call & Tomasello [55], which found that chimpanzees and orangutans more often incorporate objects in their gestures. As Liebal & Call note, this correlates with these species' greater propensity to use tools in the wild, and may therefore be indicative of a common neural substrate for tool use and gestural communication. In terms of continuities with human language, they also note some evidence that chimpanzees show population-level right handedness for manual gesturing; however, great ape communicative gestures are still typically imperative, dyadic and lacking in abstraction. In contrast, Cartmill et al. [56] discuss the uniqueness of human representational gestures—gestures that often resemble the actions on objects which they represent, but which are not in themselves motorically effective. They suggest that whereas in non-human primates gestures are typically abbreviated versions of actions that lack symbolic abstraction, humans have the ability both to deploy more abstract representational gestures (influenced by individual experience) and to use these in support of cognitive problem-solving, as well as in social communication. Cartmill et al. note, for example, that familiar instrumental actions can become routinized to simulate and represent problem-solving strategies in those task domains, and suggest that focusing exclusively on the communicative function of gestures obscures their role in support of the gesturer's own cognition.

Finally, Roby-Brami et al. [57], in an evolutionarily oriented review of aphasic and apraxic syndromes in humans, point out that transitive gestures (those involving a tool or other object) are more complex than intransitive gestures (which do not, and which are typically communicative, such as waving goodbye), and that deficits in transitive gestures are also more closely associated with classical apraxic syndromes. They observe more specifically that deficits in pantomiming of tool-using actions are associated with damage to the left IFG, while deficits in intransitive gesture (although less well understood) appear to be less closely linked to impairments of the left cerebral hemisphere. Roby-Brami et al. also note that cerebral lateralization for praxis is more strongly linked to language dominance than to manual preference, indicating commonalities at the level of semantics and conceptual knowledge, and that there is evidence for a convergence between language and praxis at the syntactic level. Although some clinical evidence of double dissociation in the incidence of aphasia and apraxia suggest that the two left-hemisphere-lateralized systems are functionally distinct, there are clearly substantial overlaps between the neural networks subserving language and praxis.

The papers in this Special Issue demonstrate a wide range of approaches to the study of primate tool use and to the action–language relationship. Five of the papers report work carried out as part of the HandToMouth project, which was funded by the European Commission's Sixth Framework NEST Pathfinder scheme (cf. papers by Macellini et al., Bril et al., Stout & Chaminade, Barney et al. and Roby-Brami et al.). Six additional papers were contributed by scientists working outside the framework of that particular project, but whose work overlapped with and complemented it. Co-authors of three of the latter papers (Frey, Iriki and Pastra) had also acted as advisers or external reviewers at earlier stages of the HandToMouth project. Undoubtedly, there remain many unsolved problems, and there are numerous additional research dimensions (such as the search for precursors in living non-human primates of human cortical control specifically of vocal gestures) that could not be explored here. Nevertheless, we believe that these papers collectively make a coherent and substantial contribution to our understanding of the evolution of tool use and language, and we sincerely thank all their authors for their support and participation.


We are grateful to the EC for financial support for the HandToMouth project (EC FP6, contract no. 29065) including its three annual scientific meetings, to Manu Davies as project administrator, and to the following who participated in one or more of those meetings as external scientific advisers: Raoul Bongers, Scott Frey, Kathleen Gibson, Joachim Hermsdörfer, Atsushi Iriki, David Ostry, Katerina Pastra and Valentine Roux. We also gratefully acknowledge the help of all the referees who commented on individual papers submitted to this Special Issue, and Joanna Bolesworth of the Royal Society for her considerable patience and support during preparation of the final publication.


View Abstract