Grist and mills: on the cultural origins of cultural learning

Cecilia Heyes

Abstract

Cumulative cultural evolution is what ‘makes us odd’; our capacity to learn facts and techniques from others, and to refine them over generations, plays a major role in making human minds and lives radically different from those of other animals. In this article, I discuss cognitive processes that are known collectively as ‘cultural learning’ because they enable cumulative cultural evolution. These cognitive processes include reading, social learning, imitation, teaching, social motivation and theory of mind. Taking the first of these three types of cultural learning as examples, I ask whether and to what extent these cognitive processes have been adapted genetically or culturally to enable cumulative cultural evolution. I find that recent empirical work in comparative psychology, developmental psychology and cognitive neuroscience provides surprisingly little evidence of genetic adaptation, and ample evidence of cultural adaptation. This raises the possibility that it is not only ‘grist’ but also ‘mills’ that are culturally inherited; through social interaction in the course of development, we not only acquire facts about the world and how to deal with it (grist), we also build the cognitive processes that make ‘fact inheritance’ possible (mills).

1. Introduction

The term cultural learning has been used increasingly in the past 20 years to refer to a broadly defined group of psychological processes, such as reading, social learning, imitation, teaching, social motivation and theory of mind [1]. They are known collectively as cultural learning, or cultural cognition [2], because they are thought to enable cumulative cultural evolution, i.e. the non-genetic inheritance of information in a way that allows individual and group phenotypes to achieve a progressively better fit with the demands of the social and physical environment [3,4]. Many researchers interested in the evolution of human cognition believe that this kind of cultural inheritance is what ‘makes us odd’ [5]. It is what makes the lives of contemporary humans—with our built environment, science, technology, art, political and economic systems—so very different from the lives of other animals, including those of our closest living relatives.

This article concerns the origins of cultural learning. Do the different types of cultural learning just happen to be able to support cultural inheritance, or have they been adapted to fulfil this function? If they have been adapted for cultural inheritance, to what extent have the adaptations been produced by genetic and by cultural processes? Researchers from the Santa Barbara school of evolutionary psychology assume that cultural learning is made possible by genetic adaptations; by an array of ‘innate modules’ or ‘instincts’ selected specifically for their capacity to support cultural inheritance [6,7]. Even researchers who typically eschew nativism, and emphasize the power of cultural evolution, sometimes imply that the capacity for cultural learning is inborn [5,8]. For example, Tomasello & Herrmann suggest that ‘ … human children come into the world ready to “collaborate”, as it were, with forebears in their culture, by adopting their artefacts, symbols, skills, and practices via imitation and instructed learning’ [8].

The idea that cultural evolution is made possible by genetic or ‘biological’ adaptations for cultural learning is both simple and plausible. It suggests a straightforward rooting of cultural evolution in biological evolution, and makes the reasonable assumption that there would have been selection pressure in favour of cognitive processes that enable cumulative cultural evolution. However, in this article, I review recent research in comparative psychology, developmental psychology and cognitive neuroscience indicating that there is surprisingly little evidence that cultural learning is based on cognitive mechanisms that have been genetically adapted specifically to enable the social transmission of information. No doubt, like nearly all complex phenotypic traits, the cognitive mechanisms of cultural learning are at some level genetic adaptations; they have been shaped by natural selection to fulfil some function(s); for example, to enable learning about predictive relationships between events, or to support precise visuomotor control. But there is very little evidence that they are genetic adaptations for cultural learning—that they have been shaped by natural selection specifically to enable the social transmission of information.

Instead I propose that the specialized features of cultural learning—the features that make cultural learning especially good at enabling the social transmission of information—are acquired in the course of development through social interaction. This implies that the cognitive processes that comprise cultural learning are themselves culturally inherited; they are cultural adaptations. They are products as well as producers of cultural evolution. We tend to assume that ‘grist’—facts about the world, and techniques for dealing with the world—are culturally inherited, and that this is made possible by genetically inherited ‘mills’—psychological processes that enable us to learn the grist from others. In contrast, the ‘new thinking’ in this article proposes that it is not only the grist but also the mills that are culturally inherited; that the mechanisms of cultural learning are forged and transmitted through social interaction.

The article focuses on three examples of cultural learning: reading (or literacy), social learning and imitation. I have chosen reading because it provides a relatively unambiguous example of the cultural inheritance of cultural learning—a proof of principle. Social learning makes an interesting contrast with reading because it is heterogeneous, not uniquely human, and highlights a psychologically important distinction between core mechanisms of learning, and perceptual, attentional and motivational input mechanisms [9]. Imitation is an especially telling example because it is widely regarded as a uniquely human adaptation for cultural inheritance, and yet recent research suggests that the capacity to imitate is socially constructed in the course of development.

2. Reading

No one doubts that reading is an immensely powerful form of cultural learning; a cognitive process that enables those who are literate to access a huge store of information acquired by previous generations. Furthermore, almost no one doubts that reading has been made possible by cultural evolution. Written language emerged too recently in human history for there to be genetic adaptations for reading. What is perhaps not so widely appreciated is the radical nature of the changes that are wrought on the neurocognitive system by learning to read [10]. This section provides a brief overview of these changes. They remind us that social experience, like ‘genes’, can have profound effects on the mind and brain. It can create whole new systems of thought; systems that could easily be mistaken for innate modules.

Research on the psychological mechanisms involved in reading is informed by experiments examining the speed of processing and the kinds of errors made during reading by healthy literate people, and by people with various types of brain damage. According to one of the most prominent models, the ‘dual route cascaded model’ (DRC) [11], the full corpus of data from these studies implies that each competent reader has distinct psychological routes from seeing a letter string to reading it aloud (see figure 1). The lexical semantic route first goes from letters to a mental dictionary of printed word forms (orthographic input lexicon), then to a semantic system encoding word meanings, then to a phonological output lexicon, storing sound information relating to words, and finally to the system producing spoken words. The lexical non-semantic route by-passes the semantic system, but uses the orthographic input and phonological output lexicons. The grapheme–phoneme correspondence route by-passes even these, allowing visually presented letters to activate phonemes and to produce speech output directly. Each of these routes, and some of their components (e.g. the orthographic input lexicon, the grapheme–phoneme rule system), are constructed by the process of learning to read. Even where a component—such as the phonological output lexicon or phoneme system—is in place prior to literacy training, route construction transforms the way in which it operates. Furthermore, these changes, wrought purely by education, affect the processing of spoken as well as written words.

Figure 1.

The dual route cascaded (DRC) model of visual word recognition and reading aloud [11].

Some of the best evidence of these radical effects of learning to read comes from behavioural studies [12]. However, the transforming effects of learning to read have been demonstrated most dramatically using brain imaging. A recent study [13] found that viewing written sentences activated large areas of the cortex more strongly in literate than in illiterate adults. These areas included: the right occipital cortex, which is involved in relatively low-level visual processing; the left perisylvian temporal and frontal language areas; and a focal area of the occipito-temporal cortex. The latter area is known as the visual word form area (VWFA) because it responds so reliably, in literate people, to the presentation of written words. If one did not know that reading is culturally inherited, it would be easy to mistake the robust response characteristics and precise localization of the VWFA for signs that the capacity to read depends on an innate module. When the subjects in this imaging experiment listened to spoken words, literacy was associated with substantially greater activation in the planum temporally, an area involved in phonological coding, and in the occipito-temporal regions that analyse visual word forms. This is consistent with other evidence that learning to read restructures our representations of spoken words [14]. After learning to read we segment spoken language into different units, and we not only hear, we also see, spoken words—they activate visual areas of the brain.

Thus, learning to read has major, constructive effects on the neurocognitive system. It does not, of course, create a new system from scratch. Like other biological and cultural processes of adaptation, learning to read takes old parts and remodels them into a new system [15]. The old parts are computational processes and cortical regions originally adapted, genetically and culturally, for object recognition and spoken language, but it is an ontogenetic, cultural process—literacy training—that makes them into a new system specialized for cultural learning.

The case of reading shows clearly that processes of cultural learning can be culturally inherited. The largely unexplored question is—how far does this go? To what extent are other processes of cultural learning also culturally inherited?

3. Social learning

In pursuit of that question, let us turn to social learning. It is commonly claimed that social learning is an important variety of cultural learning [5], but social learning is a very different case from reading, in a number of respects. First, social learning is a generic and amorphous category. Agents are said to have engaged in social learning when they have learned something by observing the actions of another agent, or the products of those actions—but only if the model's actions were not tailored to this end. If the model's behaviour was intended to communicate some information to the observer, or has evolved genetically to do so, the phenomenon is typically called ‘signalling’ or ‘teaching’ rather than social learning. Second, it is well known that other animals, not just humans, engage in social learning. For example, rat pups learn what to eat by observing the dietary choices of adults [16], and monkeys learn that snakes are dangerous by observing the fearful reactions of conspecifics to snakes [17]. This being so, it is clear that social learning is very far from sufficient for cultural evolution. Otherwise a broad range of species would show human-like cultural inheritance. However, as many authors have noted, social learning could be important for cultural evolution without being sufficient (e.g. Laland & Lewis [3], this volume), and, as we shall see, the fact that other animals are capable of social learning turns out to be very useful when we are asking about its evolutionary origins. Finally, while virtually everyone agrees that reading is a product of cultural evolution, it is widely assumed that social learning is mediated by computationally distinctive psychological processes that have evolved through gene-based selection to facilitate the non-genetic inheritance of information [1820].

In contrast with this assumption, a review of recent evidence suggests that social learning does not involve special learning processes of either genetic or cultural origin [21]. The core mechanisms of social learning—the ‘digestive’ processes that encode information for long-term storage—are the same associative mechanisms that encode information for long-term storage when it is derived, not from observing the behaviour of others (social learning), but from direct interaction with the inanimate world (asocial learning). What makes social learning distinctively ‘social’ is the way in which input mechanisms—perceptual, attentional and motivational processes that ‘ingest’ information for learning—are biased towards information from social sources. Crucially, there is evidence that in humans this biasing is often developmental; it occurs within lifetime and as a result of sociocultural experience. The next two subsections summarize the evidence pointing to these conclusions.

(a) Core mechanisms of social learning

Five lines of evidence suggest that social learning is mediated by the same core mechanisms of associative learning that allow humans and other animals to learn by direct interaction with the world. First, studies of birds and primates have shown that, across species [22,23] and across individuals within a species [24,25], asocial and social learning capabilities are positively correlated. Animals that are good at social learning are also good at asocial learning. Second, even solitary animals—such as the common octopus and the red-footed tortoise [20]—are capable of social learning. Third, the ‘anatomy’ of social learning is very similar to the anatomy of asocial learning; different types of social learning map onto different types of asocial learning [26]. For example, specialists in the study of social learning distinguish stimulus enhancement, in which the model's activity exposes the observer to a single stimulus, from observational conditioning, in which the model's activity exposes the observer to a relationship between two stimuli. This corresponds to the distinction used by experts on associative learning between single stimulus learning (including phenomena such as sensitization and habituation) and stimulus–stimulus learning or Pavlovian conditioning. Fourth, each type of social/associative learning is found in diverse species. For example, observational conditioning occurs in humans and in damselfly larvae. Human studies indicate that participants can learn an aversion to a stimulus such as a blue square not only as a result of experiencing electric shocks in the presence of the blue square (asocial learning/Pavlovian conditioning), but also by observing a model wince, as if in pain, in the presence of the blue square (social learning/observational conditioning) [27]. Similarly, damselfly larvae learn to avoid pike, one of their predators, through exposure to pike stimuli (chemical cues in water) in conjunction with injured damselflies [28]. Finally, each type of social learning bears the footprints of associative learning; it has operating characteristics known to be distinctive to associative learning. For example, observational conditioning shows blocking and overshadowing effects both when it is involved in the acquisition of dietary preferences by rats, and when it mediates fear learning in humans [27,29].

These five lines of evidence suggest that, at the level of core psychological mechanisms, there is nothing ‘special’ about social learning. There is no need to ask whether the core mechanisms of social learning have been shaped by genetic or cultural evolution to promote the social transmission of information because there is no evidence that they have been adapted, by either means, to fulfil this function. However, this does not mean that there is nothing distinctive about social learning. In some cases, social learning is just learning that happens to be about events to which the individual has been exposed through social interaction. But in other cases there is evidence that input mechanisms—perceptual, attentional and motivational processes—have been adapted to make information from social sources especially salient or accessible. In principle, this kind of adaptive biasing of input mechanisms could occur phylogenetically, under the influence of gene-based selection, or ontogenetically, via learning mechanisms and through social interaction.

(b) Input mechanisms

A comprehensive survey will be necessary to establish whether adaptive biasing of input mechanisms towards social sources is predominantly phylogenetic or ontogenetic in humans. However, pending such a review, two recent studies suggest that ontogenetic processes are powerful and important.

The first study, by Behrens et al. [30], shows that input mechanisms can be biased towards (and away from) social sources by associative learning, and that this can happen flexibly on a relatively short time-scale. In this experiment people were asked repeatedly to choose between a blue and a green option to earn points that would be later turned into money. At the beginning of each trial the options showed numbers. At some times in the experiment these gave a very accurate guide to how many points would be received if the subject selected the option, and at other times they were misleading. Next the subject was offered some advice—to choose blue or green—by an unseen confederate. Like the numbers, this social information was trustworthy in some phases and untrustworthy in others. At the end of each trial the subject made her choice, and was told how many points she was going to get on that trial. Modelling of choice behaviour, and of cortical blood oxygen dependent (BOLD) responses during task performance, showed that people used both sources of information, the numbers and the confederate's advice, in a broadly rational way. The weights assigned to the two sources—the extent to which each input was privileged in decision making–varied with the recent trustworthiness of the source, and how rapidly the trustworthiness of each source was currently changing. Modelling of the BOLD responses also showed that the value of each source of information was being tracked using prediction error, a computational mechanism characteristic of associative learning. Each time an outcome was observed, two areas of the brain—the ventral striatum (numbers) and the medial prefrontal cortex (advice)—were updating the value of the source using the difference between the outcome expected and that which was actually observed. Thus, this study by Behrens and colleagues shows that people rapidly and continuously decide whether or not to take advice, and that this input modulation is mediated by associative learning, i.e. processes that are genetically adapted, not for cultural learning specifically, but for tracking predictive relationships between all kinds of events.

The second example of developmental biasing of input mechanisms shows that the bias can be durable and very specific, promoting attention to a particular category of social stimuli. In this study, Jack et al. [31] used eye movement tracking to measure attention to emotional faces in Western Caucasian and East Asian participants. Across all emotion types and face ethnicities, they found that Western Caucasians divided their attention more equally between the eyes and mouth than the East Asians, who focussed more on the eyes. When they looked at each emotion separately, and examined the accuracy with which the expressions were recognized, Jack et al. found that the Western Caucasians' greater attention to the mouth area resulted in better recognition of fear and disgust than in the East Asian participants. This kind of cultural tuning of attention to social stimuli could have profound effects on social learning. For example, observational conditioning is a primary means of learning the value or emotional valence of types of object or event; people, animals, plants or practices become attractive or aversive when they are paired with positive or negative expressions of emotion by others. Therefore, if Western Caucasians are more sensitive to expressions of fear and disgust, it is probable that they would learn more readily via observational conditioning that certain objects are threatening or repulsive. In this particular domain they may be faster social learners, not because they have better or different genetic adaptations for social learning, but as a result of sociocultural experience tuning input mechanisms to a particular configuration of facial features.

In summary: current evidence suggests that social learning does not involve learning mechanisms that have been adapted—genetically or culturally—for cultural inheritance. However, some examples of social learning are distinctively social in that they involve input mechanisms that are biased towards information supplied by other agents. There is evidence from human and non-human animals [21] that this biasing is itself a consequence of social learning; through interaction with other agents in our group or culture, we learn to privilege input from certain social sources—under specific conditions, or across contexts.

4. Imitation

Imitation (or ‘imitation learning’) is a type of social learning that is thought to play an especially important role in cultural inheritance [1,5,32,33]. It is social learning in which the observer acquires new behavioural topography—a new way of moving parts of the body relative to one another—by observing another agent. Noting that skills such as flint knapping and basket weaving require new ways of moving the hands and fingers, relative to one another and to materials, many researchers regard imitation as crucial for the cultural inheritance of instrumental–technological skills. Imitation also appears to be indispensable in the development of communicative–gestural skills, in learning the postures, gestures and ritualistic movement patterns—such as those used in dance—that promote social bonding within groups and distinguish ingroup from outgroup members [34].

Applying the distinction used in §3, the question whether imitation is made possible by genetically evolved and/or culturally inherited cognitive mechanisms can be broken down into two parts: what are the origins of (i) the core mechanisms of imitation, and (ii) the input processes that feed imitation?

(a) Core mechanisms of imitation

It has long been assumed that imitation is made possible by genetically evolved and highly specialized cognitive mechanisms [35]; by what might now be described as an innate module. This is plausible for three reasons. First, humans are Homo imitans [33]; we may not be the only species that can imitate, but the range and precision of our imitation of body movements (rather than vocalizations) far outstrips anything found elsewhere in the animal kingdom. Second, imitation makes some highly distinctive demands on the cognitive system, and it is tempting to assume that specialized problems have specialized solutions. Unlike all other forms of social learning, and indeed most other types of behaviour, imitation requires the cognitive system to solve the correspondence problem [36]; to translate observed actions into matching executed actions; action percepts into corresponding motor programmes. Third, it is difficult to imagine how domain-general cognitive processes could solve the correspondence problem, especially for actions such as facial expressions and whole body movements, which look very different to me when I am doing them and when I am watching you doing them. These three considerations have lent support to models suggesting, implicitly or explicitly, that the core problem of imitation—the correspondence problem—is solved by specialized, human-specific, innate mechanisms that were favoured by natural selection because they enable cultural inheritance.

However, recent research has ‘imagined’ a way in which the correspondence problem could be solved by domain-general cognitive processes, and provided evidence that it is, in fact, solved in this domain-general way. The imagined solution is known as the associative sequence learning (ASL) model of imitation [37,38]. If the ASL model is correct, the capacity to imitate is to a very significant extent culturally inherited. The ASL model has two related advantages over the modular view. First, rather than simply saying that there is a psychological ‘black box’ that makes imitation possible, it specifies the cognitive mechanisms that solve the correspondence problem. Second, through this specification the ASL model makes testable predictions about imitation that have been confirmed by a now extensive body of experimental work. The remainder of this section gives an outline of the ASL model, explains why it implies that imitation is culturally inherited, and surveys some of the evidence supporting the model.

The ASL model suggests that imitation is made possible by direct, excitatory connections between visual and motor representations of action; between ‘mental images’ of what an action ‘looks like’ and what it ‘feels like’ to perform the action (see figure 2). These connections, or matching vertical associations, are forged in the course of an individual's development by the same domain- and species-general processes of associative learning that produce Pavlovian and instrumental conditioning in the laboratory. When an observer copies a novel sequence of actions, the operation of matching vertical associations is guided by domain-general processes that encode the serial order of visual stimuli. These horizontal processes learn what the novel action sequence looks like. The representation they construct would be sufficient for subsequent recognition of the sequence, and to distinguish it from sequences containing the same components in a different order. However, for imitation of a novel action—to turn vision into matching action—the visual sequence representation formed by horizontal processes must activate, in the appropriate order, a matching vertical association for each element of the sequence. Therefore, it is the vertical associations—connecting visual and motor representations of the same action—that solve the correspondence problem. They are the core mechanisms of imitation.

Figure 2.

The associative sequence learning (ASL) model of imitation.

Processes of associative learning strengthen excitatory connections between pairs of event representations when the occurrence of the two events is correlated, i.e. when they occur relatively close together in time (contiguity) and one event is predictive of the other (contingency). Therefore, a matching vertical association for, say, finger splaying would be formed by experience in which the sight of finger splaying is correlated with the performance of finger splaying. In terms of their internal structure, the processes of associative learning could just as easily produce non-matching as matching vertical associations. If the sight of one action, X, is correlated with the performance of a different action, Y, associative learning will strengthen the connection between a visual representation of X and a motor representation of Y, supporting counter-imitative rather than imitative behaviour. The ASL model implies that matching vertical associations predominate, and therefore that humans develop a capacity for imitation, rather than for counter-imitation, because certain features of the human developmental environment ensure that we more often experience correlations between observation and execution of the same action than of different actions. For example, experience of the former kind comes from direct self-observation (e.g. looking at your own hands in motion), mirror self-observation (using reflective surfaces), being imitated by others (especially facial imitation of infants by adults), synchronous activities of the kind involved in dance, sports and military training, and indirectly via the use of action words [39]. Notice that nearly all of these kinds of experience involve interaction with cultural artefacts (mirrors) or with other people in culture-specific contexts. Even an infant's opportunity to look directly at her own hands in motion is modulated by culture-specific childrearing practices such as swaddling. Therefore, the range of actions for which an individual has matching vertical associations—the range of actions she is able to imitate—depends on sociocultural experience and is culturally inherited along with artefacts, practices, rituals and verbs.

Unlike previous accounts of the cognitive mechanisms mediating imitation, the ASL model has been explicitly tested against alternative models. These experiments have examined the imitation of familiar actions and of novel sequences of actions, using behavioural and neurophysiological measures, and probing the model's hypotheses about both vertical and horizontal processes (see [38] for a review). Supporting the idea that matching vertical associations are forged by associative learning, these studies have shown that novel sensorimotor experience can enhance [40], abolish [41] and even reverse [42,43] simple imitative behaviour, and that these effects depend on the contingency between observed and executed actions [44]. It has been widely reported that humans typically show ‘automatic imitation’ of various hand and foot movements: in tasks that require us to ignore the sight of these movements, we nonetheless respond faster and more accurately when the required action matches an observed body movement [45]. Hand opening is faster when observing hand opening than when observing hand closing, foot lifting is faster when observing foot lifting than hand lifting, and so on. These imitative effects appear to be relatively impervious to the actor's intentions, but they can be changed by retraining [45]. For example, without explicit training, passive observation of index finger movement activates muscles that move the index finger more than muscles that move the little finger. However, after training in which people were required to respond to index finger movements with little finger movements, and vice versa, this pattern was reversed. Observation of index finger movement activated little finger muscles more than index finger muscles, implying that associative learning had converted automatic imitation into automatic counter-imitation [42,43].

Similarly, experiments examining the imitation of novel sequences of actions have provided evidence that it involves the same kind of sequence learning processes as non-imitative tasks; that these processes do not depend on intention-reading [46]; and that they do not show the flexibility one would expect if imitation were mediated by dedicated mechanisms [47]. For example, when people are required to imitate a sequence of movements involving the grasping of a pen and its placement in one of two containers, they show exactly the same pattern of errors as when they are instructed to perform the same movements by flashing geometric shapes. Error patterns are indicative of underlying cognitive processes. Therefore, these results indicate that the same sequence encoding mechanisms are recruited in imitative and non-imitative tasks, and by stimuli that do and do not support the attribution of intentions [46].

Two objections are commonly raised against the ASL model and its implication that the capacity to imitate is culturally inherited. The first poverty of the stimulus objection suggests that the ASL model must be wrong because there is evidence that newborn babies can imitate a range of actions. This objection will be addressed only briefly here because it has been examined in detail in a recent review [39]. Building on previous analyses [48], this review found evidence that neonates copy only one action—tongue protrusion—and that this copying does not show the specificity characteristic of imitation [49]. Figure 3 illustrates the first of these points. For each of the action types tested in young infants, it shows the number of published studies reporting positive evidence of imitation and the number reporting negative evidence. This is a highly conservative measure of how often young infants have failed imitation tests because it is much harder to publish negative than positive results. Nonetheless, figure 3 shows that the number of positive reports substantially exceeds the number of negative reports only for tongue protrusion. Evidence that even tongue protrusion matching lacks the specificity characteristic of imitation comes from studies showing that tongue protrusion can be elicited by a range of arousing stimuli, including flashing lights and lively music [49].

Figure 3.

Summary of research on imitation in neonates and young infants. The number of published experiments reporting positive results (grey bars) and negative results (black bars) for each of the eight gestures tested [39].

The second objection says that the ASL model may be a correct description of how the capacity to imitate got off the ground, but not of the way in which the core mechanisms of imitation develop in contemporary humans. Perhaps our ancestors started out by learning to imitate ‘from scratch’—without any inborn vertical associations for imitation, and using domain-general associative mechanisms—but then the capacity to imitate proved to be so useful that there was selection in favour of genetic mutations that canalized [50], prepared [51] or genetically assimilated [52] the learning of matching vertical associations. I will call this the genetic assimilation objection, but really it is a family of objections because the mutations could have acted in a variety of ways. They could have established stronger inborn connections between visual and motor representations of the same actions than of different actions; enhanced the speed or probability of learning matching, relative to non-matching, vertical associations; or acted to preserve the functioning of matching vertical associations once they have been established.

The first thing to note in relation to the genetic assimilation objection is that the ASL model does not deny that some vertical associations may be inborn or easier to learn than others, and that this could be owing to genetic evolution. However, the ASL model suggests that any ‘privileged’ associations of this kind are genetic adaptations for the visual guidance of action. Therefore, the ASL model would be compatible with the discovery of, for example, stronger inborn connections between visual representations of large objects and motor representations of power (rather than precision) grips, but it would not be compatible with evidence of stronger inborn connections between visual representations of power grips and motor representations of power (rather than precision) grips. In other words, the ASL model naturally embraces the idea that imitation is based on genetic adaptations—associative learning itself is a genetic adaptation—but it denies that these genetic adaptations are for imitation or any other aspect of social cognition.

The second thing to note about the genetic assimilation objection is that it points out a logical possibility; it does not advance any concrete evidence that the possibility has been realized. It is not based on evidence that matching vertical associations are inborn, easier to learn, or more resistant to change than non-matching vertical associations. In contrast, as indicated above, the ASL model has generated novel predictions about imitation and those predictions have been confirmed in a range of experiments. For example, these experiments have shown that automatic imitation can be abolished and even reversed by relatively brief periods of novel sensorimotor training [4042]. This is what one would expect if both matching and non-matching vertical associations are established via standard mechanisms of associative learning, and it provides no encouragement whatever for the view that the acquisition of matching vertical associations has been genetically assimilated for imitation [53].

Finally, a recent neuroimaging study of the mirror neuron system sought, and failed to find, evidence of even a relatively weak form of genetic assimilation; evidence that the learning that produces matching vertical associations is specialized or constrained to link sensory representations of action, rather than of inanimate stimuli, with motor representations [54]. Mirror neurons are cells found in the premotor and parietal cortex of monkeys and humans that discharge when a certain type of action (e.g. power grip) is observed, and selectively when the same type of action is executed [55,56]. The ASL model suggests that each mirror neuron is the ‘motor end’ of a matching vertical association. The ‘visual end’—the visual representation that has become linked with a matching motor representation—is typically located in the superior temporal sulcus, an area that specializes in the visual processing of biological movement [53]. Press et al. looked for signs that mirror neurons are canalized or prepared to develop in this way—to represent correlations between observed and executed movements—or, as the ASL model predicts, whether they are plastic enough to represent correlations involving geometric shapes. They found a striking degree of plasticity. A brief period of shape–action sensorimotor training was sufficient to create ‘geometrical shape mirror neurons’ to link mirror neurons with visual neurons in parts of the brain that are not specialized for action processing. This kind and degree of plasticity is not what one would expect if the development of mirror neurons had been genetically assimilated for imitation or indeed any other social cognitive function.

To summarize: the ASL model is currently the most empirically successful description of the core mechanisms of imitation, of the neurocognitive processes that make imitation possible by solving the correspondence problem. This model suggests that associative learning and specified types of sociocultural experience convert a system which is genetically adapted for visuomotor control into a system that is culturally adapted for imitation.

(b) Input mechanisms: under- and over-imitation

The core mechanisms of imitation determine whether an agent can, but not whether she will, imitate an observed action. The core mechanisms make imitation possible, but in most cases this potential is not automatically translated into overt imitative action. Rather, the agent decides, consciously or unconsciously, whether to enact imitation, and like all decisions relating to voluntary action, this decision depends crucially on motivational processes; it depends on what the agent expects the outcome of imitation to be, and on the value she assigns to that outcome. Recent research on under-imitation and over-imitation in children has been interpreted as indicating that, even if the core mechanisms of imitation are not genetic adaptations, imitation involves motivational processes that have been shaped by genetic evolution to promote cultural inheritance. This is an interesting and wholly coherent hypothesis but it does not currently have clear empirical support. There is no reason to doubt that humans are better at all forms of cultural learning than other animals [57], or that enhanced social motivation is important in promoting human imitation, but research on under- and over-imitation has not yet established that human social motivation has been enhanced by genetic rather than cultural processes.

Under-imitation (or, as it is more commonly known, rational imitation) provides primary support for the natural pedagogy hypothesis—the idea that human infants have genetic adaptations making them sensitive to the teaching intentions of adults; to behavioural cues indicating what adults want infants to learn [58]. In the original study of under-imitation [59], 12- to 14-month-old infants saw an adult switching on a light by touching the light with her forehead. When the adult did this, her hands were occupied, holding a blanket around her body, or free, lying on either side of the light box. When given access to the box themselves, infants who had seen hands occupied were less likely to copy the head movement—they under-imitated—relative to infants who had seen hands free. This was taken to indicate that the infants had worked out, using cognitive mechanisms genetically adapted for natural pedagogy, that the adult had only used her head because her hands were occupied, and, given that the infants were not similarly constrained, they could use their hands instead. However, a recent study challenged this interpretation by suggesting that the infants who saw hands occupied under-imitated simply because they were distracted by the blanket, and therefore less likely than the hands free infants to notice that the adult used her head [60]. This study replicated the under-imitation effect found in the original study, but also included a group of infants who were habituated to the sight of the blanket before seeing the hands occupied demonstration. When the potential for distraction was removed in this way, the hands occupied group was just as likely as the hands free group to copy the head action. Thus, the results of this study undermine a major plank of the current evidence for natural pedagogy, and the component that relates natural pedagogy most directly to imitation.

Over-imitation refers to children's tendency to imitate more components of an adult's action than is strictly necessary to obtain the outcome achieved by the model. For example, Lyons et al. [61] allowed 3- to 5-year old children to observe an adult performing a four component sequence of actions on a puzzle box, which terminated in the retrieval of a toy turtle from the box. The first two components (using a wand to remove a bolt, and tapping the wand on the box) were causally irrelevant; they were not necessary to get access to the toy. Nonetheless, these components were imitated along with the causally relevant components, even when the children had been trained to discriminate actions that they ‘had to do’ from ‘silly’ actions, and when they had been told ‘You can get it out however you want’.

There are many different hypotheses about what is going on in the minds of children during over-imitation. The causal hypotheses see over-imitation primarily as a window on children's developing understanding of causality, whereas the social hypotheses suggest that over-imitation results from distinctively social motivation; the desire to be like adults, to share experiences with others, to be liked by the model, and/or to uphold social norms [62]. Assuming that social motivation at least contributes to over-imitation, these social hypotheses raise the question of how children come to be so highly socially motivated. Some discussions imply that heightened social motivation is inborn, and that it is a genetic adaptation for cultural inheritance. This is certainly possible, but the idea has not yet been tested systematically against the obvious alternative: in the course of early development, children are reliably and richly rewarded by adult approval for imitating a broad range of actions [63], and this not only contributes to the development of matching vertical associations (see §4a), but also leaves children with the expectation that imitative behaviour will be valued and therefore rewarded. Indeed, through an associative process known as higher-order conditioning, it could make imitating or agreeing with others [64], rewarding in its own right. This hypothesis suggests that social motivation is culturally inherited; infants acquire it through social interaction with adults because the adults are themselves socially motivated. To test it against the idea that social motivation is a genetic adaptation for cultural inheritance, one would need, for example, a full programme of transfer experiments in which children are systematically rewarded or not rewarded for over-imitation in one set of tasks, and then tested for over-imitation in a another set of tasks involving different adults and materials. In the meantime, as this section has indicated, it is an open question to what extent the motivational processes that guide imitative behaviour are genetically and/or culturally adapted in ways that support cumulative cultural evolution.

5. Conclusions and future directions

Cultural inheritance is what ‘makes us odd’ [5]; it plays a major role in making human minds and lives radically different from those of other animals. This article has raised the possibility that cultural learning is itself culturally inherited—rather than being genetic adaptations, the psychological processes that make cultural inheritance possible are learned in the course of ontogeny through social interaction. It has begun to investigate this possibility by looking at three types of cultural learning: reading, social learning and imitation. In the case of social learning, current evidence suggests that the core mechanisms of learning have not been adapted—either genetically or culturally—to promote cultural inheritance. However, there are signs that input mechanisms can be biased towards social sources, and, in humans, that these adaptive biases can be driven by social interaction. Thus, the input mechanisms, or psychological ‘mills’, that modulate social learning have characteristics that are culturally inherited—e.g. offspring inherit from their cultural parents a tendency to focus on the eyes, or on the eyes and mouth, when viewing facial expressions of emotion [31]—and these mill characteristics influence the ‘grist’ that is culturally inherited through social learning—e.g. beliefs about what kinds of foods are and are not disgusting. In the cases of reading and imitation, the evidence indicates that core cognitive mechanisms, new modules, are constructed through social interaction. Learning to read reconfigures the neurocognitive system to create several distinct routes for word recognition and reading aloud. It is not only the grist of what we read—the ideas and values coded in text—but also these routes, these psychological mills, which are passed down from one cultural generation to the next through literacy training. Similarly, correlated experience of seeing and doing the same actions (e.g. while engaging in synchronous action, and being imitated) makes imitation possible by establishing a vast repertoire of matching vertical associations, many of which are embodied in the mirror neuron system. Matching vertical associations are the culturally inherited mills which enable, and are enabled by, the imitation-mediated cultural inheritance of grist consisting of specific techniques, practices and rituals.

To find out more about the origins of cultural learning we need experiments explicitly designed to test genetic adaptation against cultural adaptation hypotheses. This is, of course, very difficult to do. The hypotheses relate to evolutionary history, but the minds available for experimental analysis are of adults, children and non-human animals alive today. However, training studies, which examine the effects of novel experience on cultural learning, and cross-cultural studies, comparing cultural learning in groups that have received different sociocultural experience throughout life, can tell us a lot about the poverty, or wealth, of the stimulus. To the extent that the neurocognitive mechanisms of cultural learning have features they could not have acquired in the course of development, genetic adaptation is implicated. To the extent that they have features that could be acquired in the course of development, and that co-vary flexibly with sociocultural experience, cultural adaptation is implicated. Distinguishing genetic from cultural origins is a thorny methodological problem, but it is also one in which the burden of proof is equally distributed. As the example of reading illustrates most clearly, we know that cultural learning can be culturally inherited. Therefore, we cannot assume by default that any given type or feature of cultural learning is a genetic adaptation.

It is also important to extend the enquiry from reading, social learning and imitation to other types of cultural learning—including social motivation, theory of mind, teaching/pedagogy, and norm representation—and, if further research confirms that cultural learning is to a significant extent culturally inherited, to address the broader implications of this discovery. Perhaps the most far-reaching of these concerns the extent to which cultural evolution is constrained by biological evolution. If it is not just the grist but also the mills that are culturally inherited, cultural evolution may be on a remarkably long ‘genetic leash’ [65].

Good, hard questions hover over any discussion of the evolution of human cognition. What made the evolution of human minds ‘take off’? How exactly do our minds differ from those of other animals? These are important questions but we must be careful not to turn them into a party game where the prizes go to simple, spectacular answers [66]. It would be convenient if we could identify a ‘big bang’ of human cognitive evolution, or a small number of distinctively human, genetically evolved cognitive modules. However, as many of the articles in this theme issue make clear, the true story is more likely to be one in which multiple sources of selection pressure resulted in gradual gene–culture co-evolution of a distinctive set of cognitive processes. The view advanced in this article—that mechanisms of cultural learning are themselves culturally inherited—is compatible with the idea that human cognitive processes are distinctive in their plasticity and domain-generality [67,68]; in the degree to which they have released us from genetically determined modularity of mind. More specifically, it suggests that much of this has been achieved by increases in the range and power of associative learning, and the gene–culture co-evolution of sequence processing mechanisms; mechanisms now involved in processing vocal language and imitation learning, that began to evolve in the context of tool making and gestural communication [69]. It also implies that a good deal of the heavy lifting in the evolution of human cognition has been done by genetic and cultural adaptation of input mechanisms rather than core cognitive mechanisms; by perceptual specializations, attentional biases, inhibitory processes and motivational changes of the kind that yield social tolerance [52]. It is not one big thing, but many small things, that ‘make us odd’.

Acknowledgements

Thanks to Max Coltheart for supplying figure 1 and pointing out, long ago, that I had produced a ‘reading model’ of imitation, and to Uta Frith, Chris Frith and Kevin Laland for valuable comments on the manuscript.

Footnotes

References

View Abstract