In this paper, we review reports and present new empirical data from studies with marmosets and dogs that address the correspondence problem of imitation research. We focus on the question of how it is possible to transform visual information into matching motor acts. Here, the important issue is not the learning of a complex skill, but determining the copying fidelity of animals at different levels of behavioural organization. As a theoretical framework, we suggest a classification in terms of movement, action and result, which shows a positive relationship between the organizational level of imitation and matching degree. While the monkey studies have provided evidence of very precise copying of movements and, to a lesser degree, of behaviours, the dog studies have provided evidence of action copying and the reproduction of results. In a Do-as-I-do study, a dog attempted to reproduce the results of demonstrated object manipulations at the expense of movement details. Transitive actions were more easily replicated than intransitive ones, and familiarity of actions had a major influence. The discussion of these findings addresses the question of the neuronal mechanisms underlying imitation and whether a single mechanism is sufficient to explain the different levels of copying fidelity.
Imitation is a relatively ambiguous phenomenon. For some it is a cheap trick by which an observer saves time and energy in solving a problem by stealing the solution from a master. For others it is one of the most advanced cognitive faculties: the observer acquires information about new techniques while at the same time drawing inferences about the efficiency of the observed methods, the constraints of the situation, and the intentions and goals of the model. Related processes include social mirroring, facilitation of responses and learning about movements of an object that are caused by the actor's movements (Whiten et al. 2004, 2009). Recent conceptual development in imitation research is characterized by a widening of scope, by focusing on questions about the underlying mechanism, action understanding, intentionality, theory of mind, and possible consequences for language and culture.
Between these extremes of theory, further study of the evolution of imitation requires both a systematic overview of the existing bundle of data, but also the addition of new data where the existing knowledge is insufficient. For instance, the frequently repeated claim that only humans, and to some lesser extent great apes, are able to imitate (e.g. Byrne 2002), does not take into account the fact that the study of imitation is still lacking samples from a broad range of animal species.
(a) The precision of response topography
When comparing species, we must not forget that deciding which animals possess a specific ability depends on how precisely we define that ability. Imitative performance can vary greatly according to the copying fidelity—the degree of matching between the topographies of the demonstrated action and the observer's copy. Animals have been found to either reproduce the result or effect of a demonstration by applying an action other than that used by the model (product-oriented copying) or by copying the demonstrated actions roughly (e.g. using the same body part) as well as copying the action very precisely, matching the movement trajectory (process-oriented copying; see also Tennie et al. 2009a; Whiten et al. 2009). In this paper, we provide a selective review of imitation in non-human animal species, which demonstrates that imitative performance can appear at qualitatively different levels of specificity. This, we will summarize and organize in a descriptive and functional framework. Finally, we will investigate whether theoretical models of the mechanisms that have been proposed to underlie imitation can explain the multiple levels of copying fidelity evidenced by the empirical studies.
2. How precisely can and do animals copy?
A recurring theme in imitation research, and a challenge for experimentalists to disentangle, is what exactly is copied and what information the observer learns from the demonstrated action (i.e. uses to improve its knowledge or behaviour). In many situations, instead of precisely copying other's actions (and their results on the environment), it is more useful to understand the goal of the demonstrator's actions, and only copy those actions of the demonstrator that are relevant to the task or preferred by the observer (Tennie et al. 2009a). Furthermore, animals may learn through observation how the environment works, by learning about the affordances of objects or causal relationships between them (Tomasello et al. 1993; Zentall 2004). For instance, circumstantial data from the field suggests that New Caledonian crows (Corvus moneduloides) may learn something about the use of a Pandanus tool by seeing it being operated upon (Hunt & Gray 2003). Experimental evidence of this ability has been found in keas (Nestor notabilis), alpine parrots that learned to dismantle several locking devices through observation, but did not copy the demonstrated actions (like twisting a screw; Huber et al. 2001). It has even been shown that pigeons (Columba livia) can learn how to gain access to food by watching the movements of the relevant objects without a demonstrator causing them (Klein & Zentall 2003). Such forms of emulation clearly deviate from imitation, if we consider the latter a form of learning from the behaviour of a conspecific (but see Whiten et al. 2009 for a different perspective).
Few studies have investigated how subjects copy a model's actions independently of the results or the obvious goals of those actions. Only if arbitrary (or non-functional) gesture-like movements or facial expressions that do not convey any message are demonstrated, does the imitation task require mere movement matching. Such a design enables the researcher to determine if the observer recognizes the demonstrated action elements and uses them as a sample against which to match his choice of corresponding action.
In order to explore fully the limits of an animal's imitation power, it is desirable to guarantee that the observer understands that he/she is required to imitate as accurately as possible. Hayes & Hayes (1952) made the first attempt in this direction by training the home-reared chimpanzee Viki to reproduce a variety of actions on command. Later this so-called ‘Do-as-I-do’ paradigm has been used for studying a subject's ability to imitate specified actions in chimpanzees (Pan troglodytes; Tomasello et al. 1993; Custance et al. 1995; Myowa-Yamakoshi & Matsuzawa 1999), orang-utans (Pongo pygmaeus; Call 2001), parrots (Psittacus erithacus; Moore 1992), dolphins (Tursiops truncates; Herman 2002) and a dog (Canis familiaris; Topal et al. 2006). In all those studies, the demonstrator was a human, mostly the trainer or foster parent.
(a) Do-as-I-do studies
(i) Do-as-I-do in great apes
The lack of rigorous procedures and analyses in the Hayes & Hayes (1952) study prompted several follow-up studies some 40 years later. In the first of these, Custance et al. (1995) presented 48 novel actions to two juvenile (between 4 and 5 years of age) nursery-reared chimpanzees. They were first taught to reproduce 15 gestures on the command ‘Do this!’. From recordings of those arbitrary gestures, two independent observers identified only a third of these as matching responses. Furthermore, very few of the chimpanzees' responses were ‘perfect’ duplicates of the human model's ones. Even some of their clearest imitations were flawed in some way. Either the chimpanzees held the hand in a different orientation to the demonstrated version, or they used the other hand, or a different finger. Perhaps, the chimpanzees did not understand that they had to copy the demonstrated action as accurately as possible and therefore sought to match them only superficially.
First trial reproduction was also found in Myowa-Yamakoshi & Matsuzwa's (1999) study with chimpanzees, but at an even lower rate. Only 5.4 per cent of all presentations were followed by a clear instance of copying. Moreover, the reproductions only occurred in a specific condition; the subjects copied ‘general motor patterns’ (occurred in free-play manipulation) in the object-to-object condition (e.g. putting a ball into the bowl) only. Not a single action in the one-object condition (e.g. hitting the bottom of a bowl) or in the object-to-self condition (putting the bowl on its head) was faithfully reproduced. The authors concluded that the chimpanzees were less likely to focus on the details of the demonstrator's body movements, but paid more attention to where the manipulated objects were directed.
This general assessment of the chimpanzees' problems in reproducing ‘pure’ movements was corroborated by two Do-as-I-do studies with the 18-year-old, human-reared, language-trained orang-utan called ‘Chantek’ (Miles et al. 1996; Call 2001). Although Chantek performed more accurately than the chimpanzees, he showed a similar pattern of mistakes. Overall, the study revealed an attentional bias towards certain results or goals and a less differentiated ability to encode observed actions. Again, the details of the movements were barely replicated; the matching accuracy was high for gross body areas, but low for the body parts within those. In general, the more degrees of freedom the movements had, the lower the accuracy was. For instance, Chantek performed better in those actions that involved some contact between his body parts. In sum, it remains controversial as to whether the results provided by these studies are sufficient to conclude that chimpanzees (or great apes in general) ‘demonstrate a capacity for fairly elaborate, if approximate, matching of their own body part actions to those of another ape (the human model)’ (Whiten et al. 2009).
(ii) Do-as-I-do in non-primate species
In accordance with the earlier cited benevolent appreciation, it has even been argued that this advanced level of imitation is limited to humans and the great apes (Miles et al. 1996). But studies with a parrot (Moore 1992), dolphins (Herman 2002) and most recently, with a dog (Topal et al. 2006), have changed this ‘primatocentric’ view. The 4-year-old Belgian Tervueren ‘Philip’ was the first dog to prove capable of learning different human actions as samples against which to match his own behaviour (Topal et al. 2006). As in the ape studies reviewed earlier, Philip was first tutored to repeat human-demonstrated actions on command (‘Do it!’) and then to generalize his understanding of copying to untrained action sequences and to actions shown by other people. In another test, Philip demonstrated the recognition of a modelled object-to-object action in terms of the initial state, the means and the goal. But again, the topography of the movement patterns revealed similar limitations to the great apes.
Taking the results of all Do-as-I-do studies together, several open questions remain and these invite further investigation. First, what level of accuracy may non-human animals achieve? Both the tested actions and the collected data are too variable to draw firm conclusions. In general, the studies so far did not provide compelling evidence that animals can copy every action that they were shown in sufficient detail. Whiten et al. (2004), who reviewed the primate studies, concluded that ‘compared with children, who may show recognizable matching on all of the actions in the battery used, fidelity is typically low overall’ (p. 40). But perhaps other training regimes and systematic variation of demonstrators can elucidate more reliably what animals can and cannot imitate.
Second, what kind of rearing and previous training is necessary for an observer, whether it be a human child or a non-human animal? Importantly, Philip was trained as a service dog, that is to assist his disabled owner in tasks such as to open doors, pick up items, switch on/off lights. Furthermore, the great apes tested in Do-as-I-do studies were more or less ‘enculturated’, raised by humans or in human environments. The same is true for Moore's (1992) grey parrot ‘Okichoro’.
Third, would dogs perform the same kind of systematic errors as great apes? The difficulty of replicating body-oriented actions compared with object-oriented ones is a seemingly universal pattern. But the data from Philip are not sufficient for comparison with the ape studies. Furthermore, the imitators often confuse the actions that were shown with similar ones that were already stored in their action repertoire. Replication of these findings would point to a fundamental difficulty of transforming visual information into motor acts that have no stored counterpart and no functional equivalent.
(b) The Joy experiment
In this study, we examined the imitative ability of the dog further by investigating three kinds of comparisons: (i) the comparison between object-oriented and body-oriented actions; (ii) the comparison between functional and non-functional actions; and (iii) the comparison between familiar and novel actions to be reproduced on command. Finally, we investigated the dog's ability to wait and engage in other behaviours before replicating the previously seen actions (deferred imitation).
First, ‘Joy’, a female Weimaraner, was trained to perform eight actions on verbal command (‘Do it!’; see electronic supplementary material for methods). As in the study of Topal et al. (2006), the match between the human's action and the trained actions of the dog were defined primarily on the basis of functional correspondence: the dog was rewarded for showing grossly matching actions, rather than for achieving a movement copy with high fidelity. As soon as Joy's performance reached a high, asymptotic level, she was tested on different types of novel actions without reinforcement.
These tests revealed an interesting pattern of results, which on the one hand match the findings with Philip, but on the other hand extend them in a number of important ways. If confronted with actions that are new but composed of familiar elements, Joy rarely deviated from the demonstrated ones, irrespective of whether she had to copy object-oriented or body-oriented actions, but if she did deviate, she did so only by choosing other trained actions (for more details of the results, see electronic supplementary material). It is likely that these ‘mismatches’ resulted from memory problems, such as pro- and retro-active inference, rather than from a copying inability. This seems to apply to actions that have not been trained before, but are composed of movements or behavioural elements from her action repertoire (II: novel actions). In the few mismatching cases, Joy responded initially with a training action or an action from her repertoire and later approximated what was shown (see Whiten et al. 2004 for similar descriptions in apes). Alternatively, she started with the shown action and then showed routine behaviour like sniffing and searching.
Particularly interesting from the perspective of the composition of complex movement patterns is how compound actions, or actions that are composed of clearly distinctive parts, are copied. When Joy was presented with action sequences (III) that were composed of two training actions, she matched only a third of them. Sometimes she performed just the second action, indicating recency effects. Equally poor was Joy's matching ability when confronted with exotic actions (IV), that is extremely unusual actions, of which her body should be capable, but that she never had performed before (and therefore are missing from her action repertoire). The critical question here is whether a dog could spontaneously create movements (single ones or action sequences) from observation. Especially informative are non-functional, gesture-like movements, because neither action results nor the demonstrator's goal could be used to infer the action.
Joy did not replicate any of the exotic actions on the first trial, but showed some tendency to approximate the action in three trials with object-oriented actions. Interestingly, when these actions were again demonstrated about a year later, Joy functionally matched them using her body's most effective parts, like the mouth for picking up a towel (instead of the hand as shown by the human demonstrator). Of course, such matching tendencies may also be due to local enhancement and/or affordance learning (Zentall 2004). The intransitive (body) actions, however, were not even approximated (see also Tennie et al. 2009b, for a similar result).
In order to test whether Joy's copying was reflexive or purely facilitative, rather than the result of an enduring representation of the demonstrator's behaviour, we conducted a so-called deferred imitation test (V). As expected, Joy's matching degree decreased with the increased delay of the command. However, she could perform correctly with delays shorter than 5 s and once matched a familiar action even after 35 s.
The final test addressed explicitly whether Joy would copy ‘blindly’ or would try to make sense of the action and then re-create the most effective or ‘rational’ solution (Range et al. 2007). We required Joy to copy actions for which the ‘target object’ was not (or no longer) present (so-called vacuum actions). For instance, the human demonstrator ‘jumped over nothing’, ‘imitated drinking (on fours on the ground) from nothing’ and put a ball into nothing (for more methodological details and a full list of vacuum actions see the electronic supplementary material). Without exception, Joy responded by performing an action that was in context or functionally similar. In the case of jumping, Joy jumped over the ‘real’ hurdle, which was standing nearby. After observing drinking, Joy ran to the bowl and sniffed at the grass. Joy manipulated the ball in various ways after observing the ball being put into a non-existent box.
Taken together, the Do-as-I-do experiments with the two dogs Philip and Joy are comparable with those of great apes in that the same factors seem to influence the matching degree of imitation. Like apes (see Call 2001), dogs are not particularly sensitive to details of the actions, but mostly achieve a functional fit. Their actions seem to be goal-directed and object bound, and shortcuts reveal that they are often driven by efficiency (Range et al. 2007). As reported for apes (e.g. Myowa-Yamakoshi & Matsuzawa 1999), dogs show similar tendencies of perseveration, as in novel situations they fall back into the attractors of training actions. And finally, superior performance with object manipulations in comparison with body-oriented movements is not only congruent with the findings from great apes, but also with those of autistic children (Heimann et al. 1992).
(c) Two-action tests
From the experimental point of view, a weakness of the Do-as-I-do paradigm is the lack of some important controls. In particular, if objects are involved or movements are targeted towards specific locations, enhanced attention towards the object or the outcome may suffice to trigger a more or less matching behaviour. Experimental psychologists have therefore developed a kind of acid test for imitative learning, the so-called two-action test, to control for both social influences and emulation/enhancement effects (Zentall 2004; see Miller et al. 2009 for a recent, pure example with dogs). It involves comparing two groups of observers watching demonstrators that differ in their body movements but create identical (or symmetrical) changes in the environment.
A bias in favour of demonstrator-consistent responding implies that the subjects copied one or both of the observed actions. Interestingly, the greatest body of evidence of this kind of action imitation comes from birds (see Zentall 2004 for a review). Budgerigars (Melopsittacus undulates), European starlings (Sturnus vulgaris), pigeons and Japanese quail (Coturnix japonica) have provided evidence for this. The birds observed a demonstrator using its beak or its foot to depress a lever or plate and subsequently made preferential use of the same effector. However, a weakness of this ‘beak/foot two-action procedure’ is that the movements involved are very simple (pecking and stepping), and therefore it is very unlikely that birds learn by observation how to perform these movements. What they instead may have learned is what response to use in a specific situation. Theorists have distinguished this form of imitation as context imitation (Byrne 2002) and stimulus–response (S–R) learning by observation (Saggerson et al. 2005). For instance, observation of a conspecific facilitated reversal of a conditional discrimination in pigeons (Dorrance & Zentall 2002; Saggerson et al. 2005). This form of imitation may be outcome-insensitive or ‘blind’, rather than goal directed, failing to show learning about response–outcome (R–O) relationships (McGregor et al. 2006).
(i) Body part imitation
A more stringent test of whether animals can learn a new movement by observation should involve the demonstration of at least one action that is unlikely to be performed by animals without the opportunity to witness its performance. We applied this methodology, permitting two groups of marmosets (Callithrix jacchus) to observe a demonstrator using one of the two alternative techniques to remove the lids of baited film canisters and compared their initial test responses with one another and with a third group of marmosets that were never given the opportunity to observe a demonstrator (Voelkl & Huber 2000). The results from the latter group showed that one technique (hand-opening) is a quite common response for marmosets, while the other technique (mouth opening) is very unlikely, and could thus be considered a behavioural ‘peculiarity’.
Imitation learning may be said to have occurred if subjects show a significant elevation in the frequency of an observed action over the normal probability of its occurrence. This was indeed the case. Both groups of observers preferred to open the canisters using the same method as their models. While the observers of the mouth model used both hand and mouth to open the canisters, the observers of the hand model never used their mouths. Since both models brought about identical changes to the canister (removal of the lid), the differential test behaviour of the animals suggests that they indeed replicated the model's behaviour, rather than having learned about certain properties of the canister.
From a functional point of view, one may ask whether this matching has any significant advantage over learning about the apparatus plus using one's own preferred method. Therefore, we altered the task slightly in a second test series by closing the lids of the canisters more firmly. After this change, mouth-opening was the only method that could lead to success; the animals could not produce the necessary leverage with hand-opening. Thus, while the first test asked only for the preferred opening technique of the observers, the second test asked whether the subjects could actually switch to mouth-opening if this is necessary for task completion. In this second test all except one observer of the mouth-opening model succeeded with mouth opening, while not a single observer of the hand-opening group could open the canisters. This result indicated that paying attention to how a skilful model solves a problem and then attempting to do it similarly is truly a case of learning, in the sense of an adaptive modification of behaviour (Lorenz 1977).
(ii) Behaviour matching at the action level
Recent theories of imitation have dissected the imitative act into two components: the body part used and the action performed (Chaminade et al. 2005). The finding that marmosets would copy mouth versus hand use could be seen as strong evidence for body part imitation, in the sense of using the same body appendices or parts to achieve the demonstrated outcome, but not that a new movement was learned per se.
A step further in the question of copying fidelity would be to experimentally examine the degree of convergence between actions performed with the same limbs of skilled demonstrators and observers. Bugnyar & Huber (1997) provided some evidence for imitation at this action level, where a compound method is composed of a string of behaviour elements. Five marmoset observers were allowed to watch a physically separated model pulling open a pendulum door to gain access to food inside the box three times. When these subjects were tested, the door could be either pushed or pulled. Three of the five observers, but none of the six non-observing control subjects, spontaneously opened the door by pulling. They not only showed a bias in favour of the demonstrated method, but also acted in a manner very similar to the model. Two of them copied all action elements in the appropriate order, considering the combined probability for spontaneous occurrence of these parts (p = 0.073); this is very unlikely to be owing to chance (p = 0.045). Pulling the door was obviously not a simple act, but rather a compound behaviour that could be split into four independent elements plus one dependent element: (i) using the left hand; (ii) taking the door at the right edge; (iii) pulling; (iv) holding the door wide open with one hand; and (v) taking the food. Thus, two marmosets imitated at a level of specificity that has not been achieved by monkeys before.
(iii) Movement imitation
The previous two examples of marmosets opening food containers with the same body part or the same sequence of hand actions as the model can be summarized as imitation at the action level. For copying to qualify as movement imitation, however, the observer must be able to copy the specific response topography, that is, the specific action by which the response is made (Zentall 2004). We therefore asked whether the seemingly high similarity between model and observer movements for lid opening in Voelkl & Huber (2000) may also reveal a convergence of movement patterns.
To investigate whether matching occurred at such accuracy, Voelkl & Huber (2007) performed a detailed analysis of the movement trajectories of the animals' heads during the mouth-opening process. First, by tracking the head movement during the opening of the film canisters on a frame-to-frame basis (25 frames s−1) they could reconstruct the motion trajectories of the head and calculate five basic movement parameters: the change in the inclination of the head during the opening action, the overall direction of the movement, the total path length (the length of the path described by the heads’ centre of gravity from the beginning to the end of the action), the direct path length of the movement (the length of the direct line from the position of the head at the beginning to the position of the head at the end of the action) and a detour factor defined as the fraction of the total path length divided by the direct path length. The underlying parameters of successful opening movements varied considerably, suggesting many degrees of freedom for the path to successful opening. Thus, any similarities between movement patterns of model and observers could not be explained solely by functional constraints.
Rather than taking an indirect and ‘lean’ route to estimate the similarity between model and observer movements by asking ‘blind coders’, Voelkl & Huber (2007) calculated the matching degree on the basis of the movement parameters. A discriminant function analysis of the orthogonalized data produced a function with clearly distinctive discriminant scores for movements of the model and the non-observers (figure 1). The function classified 13 of the 14 observer movements (93%) as model movements and only one as a non-observer movement. Principal components analysis proved it impossible to ascribe the discriminative power of the discriminant function to a single movement parameter, but possible to ascribe it to the combination of at least four parameters: observer movements resembled the model in showing only a slight rotation of the head and a relatively short, direct and flat movement path (see figure 1a for three examples). These parameters varied considerably in non-observers, therefore precisely how to open the lid cannot be considered an all-or-nothing behaviour. These results indicate that what the marmosets copied was not only the body part or an overall action, but details of the movement in the sense of a pathway of the model's head through time and space.
Although on first sight the close match between the movements of the model and her observers—a stark contrast to the substantial deviation from those by the non-observer movements—indicates high fidelity imitation, such a matching could also be the result of convergence on common motor patterns through practice. As the observers had more experience with the canisters (i.e. more trials with completely shut canisters before measurements were taken than the non-observers), it is possible that individual learning leads to convergence towards the model's efficient action. However, the data suggest that this is very unlikely. Although our analysis of the effect of practice on the topography of the lid-opening movement cannot be based on straightforward inferential statistics, we could not find any trend (not to say significant correlation) in either the observers or the non-observers that would indicate that their movements became more similar to the model's movements with increasing experience (for statistical details and a scatter plot of theses data, see electronic supplementary material). Therefore, we can confidently conclude that the extremely high similarity between model and observer movements is a result of observation rather than of individual learning.
Is there any other evidence for observational copying of movements in a highly precise manner in non-primate animals? From a functional point of view, do we know any example that indicates that the precision of copying movements by observation matters? Would animals benefit less or not at all if they imitate only superficially? Recently, zoologists have made a surprising discovery.
(d) Copying a difficult hunting technique in fish
The archer fish (Toxotes chatareus) is known for its ballistic hunting technique, with which it knocks down aerial insect prey from heights above the water level that are otherwise inaccessible. Their weapon is a precisely aimed shot of water. They can even learn to release their shots to hit moving prey. This is quite a remarkable accomplishment, as the shooter must take both the target's three-dimensional motion as well as that of its rising shot into account (Schuster et al. 2006).
Recent evidence suggests that these fish can learn to release their shot in a way that accounts for the target displacement during the shot's rise. Performing the so-called ‘leading strategy’, the fish assumes final orientation toward future point of hit. How do they acquire this precision? Interestingly, training with horizontal motion results in acquisition of the leading technique. But does this acquisition require long periods of trial and error learning? If so, can the fish shortcut this by observing a skilful model?
Yes, they can! When a group of archer fish was unable to practice because a dominant individual prevented them from shooting, they learned the complex sensorimotor skill from extensive observation of the skilled group member. Probe trials that took place before any training was given revealed that they did not shoot or were unable to score hits with their sharp jets, even at the lowest height and speed. After the dominant fish had learned the task, it was removed and the other fish were allowed to shoot. In almost their first tests their performance approached that of the long-trained model, and was far above the score that the model was able to reach when it had started its practice (Schuster et al. 2006).
Schuster and colleagues (2006) concluded that this remarkable instance of social learning in archer fish implies that observers can ‘change their viewpoint’, mapping the perceived shooting characteristics of a distant team member into angles and target distances that they must use later for a successful hit. This means that not only are body movements precisely copied, but also the relations between the movements of the model and the movement of the prey.
3. A framework for multiple levels of copying fidelity
In the previous section we have reported experiments that provide fairly good evidence of the potential to imitate at different levels of specificity and precision. They substantially contribute to the current evidence from the literature of social learning in non-human animals by showing that marmosets—and probably archer fish—are not only able to deliver an action from their motor repertoire upon demonstration of the same action by a model, but can precisely adjust their movements to those that have been demonstrated. Of course, we do not consider the creation of new movements from scratch, down to the flexing of muscles, but rather the modification of parameters of the movement trajectories and the underlying forces. Furthermore, we have shown how difficult—but not impossible—it is for animals to copy pure movements that have no environmental effects compared with object-related actions.
Is there a common theoretical umbrella that covers the performances of a wide range of species in imitation tasks and contributes to the question of the underlying mechanisms? A meta-analysis of the literature including our own findings suggests a classification in terms of results, actions and movements. Comparing the demonstration and the copy at these three organizational levels can reveal either two or three different matching degrees for body-oriented and object-oriented actions, respectively (figure 2).
An observer's focus towards the results of the demonstrator's manipulation of objects is likely to prevent faithful copying of action details or movement parameters, as it was observed, among many others, in keas dismantling locking devices (Huber et al. 2001). If no objects are involved and no changes to the environment are produced, this source of information is missing, and animals seem to have a much harder time in matching the demonstration, as reflected by Joy's higher success in copying object-oriented actions compared with reproducing body-oriented actions.
At the next organizational level, copying actions have been shown in both dogs and marmosets, but with different accuracy. Dogs have copied jumping actions or turning around, whereas marmosets copied the details of pulling a pendulum door and the use of the mouth to open a canister. It seems here as if monkeys would generally achieve higher levels of precision than dogs. However, differences in their body schema and in their evolutionary history have to be considered. The body plan can be viewed as an evolutionary constraint for motor behaviour. Primates have their fore-legs free, they manipulate objects in various ways and they use their hand in coordination with their head and mouth. Dogs, however, exhibit little head-mouth coordination, and object-related actions are also less precise and less fine-tuned. In addition, communicative gestures involving head-hand-body play a greater role in primates’ behaviour than in dogs. Taken altogether, we may think of different abilities to control one's body, resulting in differences of imitative performance despite possibly similar capacities of goal-directed observational learning. Recent support for the latter has come from experiments on ‘rational’ imitation of human infants (Gergely et al. 2002), dogs (Range et al. 2007) and chimpanzees (Buttelmann et al. 2007).
At the third level of matching, not only results and actions, but also movements are copied, as demonstrated by the accurate analysis of the mouth-opening technique of marmosets. The functional advantage here is that the observer does not need to understand the action or why it is efficient to produce the results. For example, if inexperienced animals like infants need to learn how to survive on their own, not having lifelong experience, copying faithfully in the absence of insight would be highly beneficial (Huber 1998).
From the functional and evolutionary point of view, this variation in copying fidelity is not very surprising. It is more difficult to see, however, whether any of the mechanisms that have been proposed to underlie imitation can lead to such a variable behavioural performance. Also, the question remains as to whether variability arises from different mechanisms responsible for perception-action correspondence or for controlling imitation.
4. Which mechanism can explain the different levels of copying fidelity?
A refreshing new perspective that our review has brought into focus is the general theoretical problem that cross- and within-species variety of copying fidelity represents for ‘the unity of mechanism’ assumption. The question that our selective review and the new data that we present raises is whether there is just one single mechanism that underlies all these types of imitative competence and levels of specificity of matching, or whether there may be a variety of different mechanisms involved. The general task would be conversion of observed actions by others into actions executed by one's self. In other words, visual input needs to be represented appropriately and then transformed into corresponding motor output.
To start with the most parsimonious assumption, one may ask whether the cross-species and cross-functional differences in levels of specificity of imitation can be managed by a single mechanism. Byrne (2002, 2003), for example, claimed that most cases of imitation in non-human animals can be explained quite parsimoniously by social mirroring (the observation of a conspecific's behaviour triggers the same behaviour through motivation effects) and response facilitation (the increased probability of performing a response already in the observer's repertoire by observation of another animal performing that response). The hypothetical neural mechanism of the latter is the priming of brain records. If the sensory inputs received during observation and execution of the action are similar, observation of the response will activate a ‘record’ of the action, and this increases the probability that the action will be performed. However, McGregor et al. (2006) argued that response facilitation would be possible only for relatively transparent actions (yielding similar sensory feedback when observed and executed), and its effects are of limited duration. Therefore, it could only account for a very limited set of the imitation data. It would neither fit well with the marmosets’ lid-opening replication nor with Joy's deferred imitation performances.
The translation of sensory input into motor output poses a significant computational challenge, particularly in ‘opaque’ cases in which the observation of the actor and the execution of an action by an imitator result in sensory inputs in different modalities and frames of reference (see Catmur et al. 2009). A prevalent answer to this question of proximate causation of imitation is that of direct matching, the activation of neuronal correlates of observed action patterns in their current repertoire. Mirror neurons seem capable of accomplishing this job (Rizzolatti 2005). These bimodal cells, found in the premotor cortex (area F5) of rhesus monkeys (Macaca mulatta), respond in a similar manner to simple, goal-directed manual actions, whether made by the monkey itself or an individual it is watching. Soon after their discovery (DiPellegrino et al. 1992), it has been argued that mirror neurons mediate imitation (Jeannerod 1994). The mirror system ‘resonates’ by actively matching the observed action with motor responses stored in the premotor cortex, thereby allowing fast, efficient responses to that action.
Theorists have proposed that even those cases of behavioural matching that are based on the most stringent test of imitation, the two-action paradigm, can be explained by ‘resonance’ on the basis of mirror neurons (Byrne 2003; Rizzolatti 2005). The marmosets' use of either mouth or hand to deal with the same task would rely on visuo-motor mapping from seen parts of the model's body to equivalent parts of the self. Accordingly, the observation of mouth actions directly activates similar motor programs in the monkey premotor areas, leading them to resonate and consequently, to give rise to an overt replication of the observed gestures. Indeed, in addition to mirror neurons coding for hand movements such as grasping, researchers have also found mirror neurons for mouth movements (Ferrari et al. 2003).
There is, however, a fundamental problem with this account. Mirror neurons become active regardless of the effector (the hand or the mouth) used to achieve a specific goal (e.g. grasping an object). Therefore, they would be indiscriminately activated with respect to the two actions used in the two-action test, which differ only in the effector, not the goal. Conversely, they would also not be helpful for imitation of actions that have the same effector, but for different purposes (e.g. pushing an object away, or pulling it towards the body).
The data from the Do-as-I-do studies are ambiguous with respect to their support of the mirror neuron's role in imitation. The fact that familiar actions are reliably reproduced but novel actions are not matched with high precision, in addition to the fact that elements of familiar actions are used to compose matching responses, fits well with the direct matching hypothesis. However, the dog Joy was able to reproduce faithfully one (of two) novel body-oriented and five (of six) novel object-oriented actions. In particular, the precisely matched object-oriented actions were unfamiliar and cannot be considered as part of her action repertoire. This is clearly at odds with the direct matching hypothesis of imitation.
With respect to transitive and intransitive actions, the overall difficulty in reproducing the latter would again support the involvement of mirror neurons, as they do not respond to the sight of a hand mimicking an action or to meaningless intransitive movements (Rizzolatti 2005). Furthermore, none of the ‘exotic’ actions were matched by Joy in the first trial and the demonstration of vacuum actions were merely responded to in a functionally similar way as the familiar goal-directed actions. It seemed as if Joy was trying to make sense of the action by searching for its goal. For instance, she responded to the mimed hurdle jump with a jump across an object standing nearby (but not where the mimed jump was performed).
For the vast majority of mirror neurons in macaques, the sensory-motor congruence, that is the similarity between the action when observed and the action when executed, is broad and confined to the goal of the action. The second hypothesis regarding the function of mirror neurons was therefore in terms of ‘action understanding’, not by having explicit or reflexive knowledge about the similarity of perceived and executed actions, but by coding the goals and consequences of actions, rather than the details of the actions themselves (Rizzolatti et al. 2001). An observed action acquires meaning for the observer when it activates motor schemas whose outcomes are known to the observer. In other words, the observer understands a perceived action by simulating, without executing, the agent's observed movements. Data from studies in which the observer is placed in a ‘meaningful’ situation, but with experimental sensory conditions being different from those that typically trigger mirror neurons, have supported this view (Umiltá et al. 2001; Kohler et al. 2002). However, the motor properties of the mirror system only represent an agent's ‘motor intention’ of an object-oriented action, not an agent's ‘social intention’ or its ‘communicative intention’ (Jacob & Jeannerod 2005).
An alternative explanation of Joy's response to the vacuum actions would be in terms of Gergely & Csibra's (2003) theory of action and goal understanding. Joy may have interpreted these pretence-like intransitive versions of transitive actions as ‘real’ transitive actions, attributing to them goals and then emulatively producing some action directed towards the same goal by activating a goal-relevant action from the available motor repertoire. However, this third mechanism of goal attribution (in addition to action-effect associations and simulation procedures), called teleological reasoning, has been proposed for humans (Csibra & Gergely 2007). Although Range et al. (2007) provided recent evidence that dogs are able to imitate goal-directed actions selectively, further experiments have to prove whether their goal attribution is guided by inference about efficiency and relevance of the demonstrated actions.
A major objection to the role of mirror neurons for imitation is that understanding an action does not by itself facilitate its replication. For an action to qualify as imitation in the restricted sense of movement (or bodily) imitation, the observer must be able to learn the specific response topography, that is, adopt the idiosyncratic form of the models’ movement (Zentall 2004). Rizzolatti (2005) has proposed two mechanisms for imitation learning. One is learning by observation a new motor sequence that is useful to reach a certain goal. As the whole sequence of actions used by the marmoset demonstrator in Bugnyar & Huber's (1997) study to open a pendulum door was by itself improbable, it can be considered as a new action pattern, in its entirety not stored in the monkey's premotor cortex. Of course, the pulling action may be part of the action repertoire of a marmoset. But the matching of the entire action complex appears to require a fairly advanced representation, which then guides the composition of a fluid motor sequence in the observer.
The other mechanism for imitation learning (Rizzolatti 2005) is the substitution of the motor pattern spontaneously used by the observer in response to a new motor pattern shown by a demonstrator. It has been suggested that imitative learning is implemented by interactions between the core imitation circuit (STS–PFG–F5: superior temporal sulcus—part of the rostral sector of the inferior parietal lobule—the rostral sector of the ventral premotor cortex), the dorsolateral prefrontal cortex (BA46) and a set of areas relevant to motor preparation (Iacoboni 2005; see Ferrari et al. 2009). When the observer sees that, for instance, another grip is more efficient than the one previously used to reach the goal of an action, this new grip is coded in STS. The learning process consists of the production of the motor pattern that activates, via backward connections, those PFG neurons that receive the sensory copy of the desired action from STS. The comparison between the visual aspect of the performed action and the sensory copy of it will allow a modification of the internal motor pattern, until this pattern produces an action similar to the observed one. Similar conceptual frameworks serve for motor learning and sensorimotor control. It is thus based on multiple pairs of ‘predictor’ and ‘controller’ models that process feedforward and feedbackward sensorimotor information, respectively (Wolpert et al. 2003).
Would this model of action adjustment constitute the hypothetical mechanism that can account for the findings of Voelkl & Huber's (2007) study? Unfortunately, this so-called ‘low-level resonance mechanism’ was thought to be confined to humans. ‘During evolution, the mirror system evolved as a system whose main aim was to match sensory information to personal motor knowledge of action meaning. This system became progressively richer and more complex, and, in humans, came to include intransitive actions and detailed specifications of how an observed action is executed. This evolved mirror system became the basis for reproducing actions performed by others; that is, for imitation.’ (Rizzolatti 2005, p. 75). According to this view, imitation is a cognitive faculty that evolved later from the mirror neuron system following the acquisition of new matching properties by mirror neurons.
Is there any other model available that would explain the kind of fine-tuning modification of an action by observation that has been observed in marmosets and archer fish? Obviously it would exclude models for which the key to imitation is observation-triggered activation of existing motor representations. Among them are generalist or associative models, which rely solely on task- and species-general processes of associative learning and action control. For instance, the ‘associative sequence learning’ (ASL) model (Heyes & Ray 2000; see Catmur et al. 2009) explains imitative capacity in terms of learned perceptual-motor links (contiguity-based ‘matching vertical associations’) of action units that become sequentially combined by action observation. It would also be incompatible with specialist or transformational theories that suggest that the correspondence problem is solved by activating an (human-specific) innate, cognitive mechanism that represents observed actions in a special-purpose ‘supramodal’ or symbolic code (Meltzoff & Moore 1997). These theories would also require the observers to have had a motor representation of the specific, idiosyncratic form of the model's action before they observed the model.
If further experimentation confirms that the monkey mirror system is a closed system linked to objects, and imitation in humans is based on an advanced mirror network, one would need to find a third mechanism to explain the cases of action adjustment and high fidelity copying after observation in marmosets, as well as copying of intransitive acts in parrots, dogs, chimpanzees and dolphins. Furthermore, one needs to explain how the high level of control of imitation, as it has been found in ‘rational imitation’ studies of dogs and chimpanzees, is implemented in the brain. Interestingly, neurons with congruent perceptual-motor properties have recently been found in birds (Prather et al. 2008), and the functional link to learning and imitation has also been considered (Tchernichovski & Wallman 2008). We may therefore propose as a null hypothesis that the mirror system in monkeys may be embedded, as in humans, in a larger network of feedforward and feedbackward models, and that it is widely distributed across vertebrate species, rather than being confined to primates.
This work was supported by funding from the European Community's Sixth Framework Programme (to L.H.) under contract number NEST 012929 and by the Hungarian Research Fund (to A.M.; T049615). We would like to thank Cecilia Heyes, Gyorgy Gergely, Andrew Whiten, Claudio Tennie and an anonymous reviewer for their comments or assistance at various stages of the manuscript. We also thank Anna Wilkinson, Ewen Glass and Andrew Whiten for improving the English.
One contribution of 13 to a Theme Issue ‘Evolution, development and intentional control of imitation’.
- © 2009 The Royal Society