Royal Society Publishing

Evo-devo, deep homology and FoxP2: implications for the evolution of speech and language

Constance Scharff , Jana Petri


The evolution of novel morphological features, such as feathers, involves the modification of developmental processes regulated by gene networks. The fact that genetic novelty operates within developmental constraints is the central tenet of the ‘evo-devo’ conceptual framework. It is supported by findings that certain molecular regulatory pathways act in a similar manner in the development of morphological adaptations, which are not directly related by common ancestry but evolved convergently. The Pax6 gene, important for vision in molluscs, insects and vertebrates, and Hox genes, important for tetrapod limbs and fish fins, exemplify this ‘deep homology’. Recently, ‘evo-devo’ has expanded to the molecular analysis of behavioural traits, including social behaviour, learning and memory. Here, we apply this approach to the evolution of human language. Human speech is a form of auditory-guided, learned vocal motor behaviour that also evolved in certain species of birds, bats and ocean mammals. Genes relevant for language, including the transcription factor FOXP2, have been identified. We review evidence that FoxP2 and its regulatory gene network shapes neural plasticity in cortico-basal ganglia circuits underlying the sensory-guided motor learning in animal models. The emerging picture can help us understand how complex cognitive traits can ‘descend with modification’.

1. Introduction

The purpose of this paper is two fold. On the one hand, we will review recent contributions from the field of molecular genetics and neurobiology to understanding acoustic communication in humans and animals and we will place those findings into a larger evolutionary developmental framework bearing on the evolution of language. On the other hand, we would like to critically re-examine the evidence for the claims that certain features of language are unique to humans and absent from all animals.

The inclusion of concepts from developmental biology to evolutionary theory has led to the field of ‘evo-devo’ [1]. Others, including D'Arcy Thompson [2], Alan Turing [3] and John Maynard Smith [4], had previously entertained the notion that the structure, composition and dynamics of the developmental system pose limits on the phenotypic variability. During the last decade, experimental data from many systems have illustrated how developmental principles and constraints actually contribute to morphological novelty. Drawing on molecular and genetic methods, developmental biologists have uncovered conserved molecular networks that shape the morphology of different species [5]. How these molecular pathways change during the course of evolution and thereby contribute to morphological adaptations is a central theme in the current evo-devo research [68]. General concepts are emerging that may not only apply to the evolution of form, but also extend to the evolution of behaviour ([9]; see also [1012]).

Language and music are behaviours that constitute a fascinating evolutionary puzzle. Animals are usually considered to have neither language nor music, so how, when and why did these traits evolve in the human lineage? Did they evolve gradually [13] or through a sudden change [14]; were they driven by natural [15] or sexual [16] or relaxed [17] selection; or did they emerge via other processes [18]? Academics throughout the documented history of human scholarship have written on the subject. To quote Noam Chomsky ‘There are libraries of books and articles about evolution of language—in rather striking contrast to the literature, say, on the evolution of the communication system of bees’ [14, p. 14] and many different scenarios of how language evolved have been proposed [1928]. However, as one eminent contemporary scholar in the field, Tecumseh Fitch, succinctly put it ‘… discussion of the evolution of language often involve more speculation than data…’ [29, p. 258]. Given this skewed relationship between the amount of empirical data and the number of publications, it may not be surprising that there has been a shift in appraisal concerning the potential contribution of animal studies to the subject of human language evolution, from initially very little [30] to recently a lot more [13,28,31]. Where one stands on this issue depends on one's view of the ‘uniquely human aspects’ of language. Which aspects of language are unique to humans are still a matter of debate [3234], but that some aspects of language are uniquely human and absent from all animals' communication is not questioned in the literature. This is actually a bit surprising, given that the communication systems in many animals have hardly been exhaustively studied. We will return to this question below. What is clear though is that human language must have emerged through qualitative and quantitative modifications of morphology that existed in our primate ancestors, e.g. changes in the size of brain regions, alterations of the strength of neural connections, creation of new neural pathways, and morphological alterations of the peripheral sound production and sound perception machinery. Among the molecular mechanisms that are known to shape such changes are heterotypy (altered gene products), heterochrony (altered timing of gene expression), heterotopy (altered spatial gene expression) and heterometry (altered amounts of expression). How these changes come about in evolution is a topic of lively debate and investigations.

An evo-devo framework and the concept of deep homology [35,36] increasingly permeates the thinking of biologists of any field, but may be less familiar to linguists, cognitive scientists, psychologists and philosophers who are interested in the evolution of language. We propose that, in considering the evolution of human language, an evo-devo approach can provide a useful theoretical framework to study ‘genetic modules’ that are necessary components of language. Specifically, we argue that the FoxP2 transcription factor and the regulatory molecular network that it interacts with may be part of a molecular toolkit that is essential for sensory-guided motor learning in cortico-striatal and cortico-cerebellar circuits in humans, mice and songbirds and maybe even invertebrates. The nomenclature for Forkhead (Fox) genes follows Kaestner et al. [37]. Essentially, the spelling is: human, FOXP2; mouse, Foxp2; and all other species, FoxP2. As per convention, genes and mRNA are italicized, proteins not.

2. Language, speech, animal communication and vocal learning: some definitions

When discussing the evolution of language in the context of animal vocalizations, it is necessary to define some terms. Human communication can be verbal, using words or non-verbal, such as cries, sighs, laughter and gestures. The latter may have been the substrate for the development of so-called protolanguage [38]. Spoken language expresses facts, thoughts and feelings orally using specific sounds that are created via a precisely coordinated motor programme involving jaw and orofacial musculature as well as the muscles in larynx, neck, chest and abdomen. This motor aspect of language is called ‘speech’. Non-verbal and verbal vocalizations occur in many behavioural contexts, voiced in a communicative context but also when alone (‘inner voice’).

Animals also communicate in many different contexts, including mate attraction, territorial defence, group cohesion, foraging or parent–offspring interactions [39,40]. In addition, animals may vocalize outside of any obvious communicative context, as is the case when male zebra finches sing ‘undirected song’, which often occurs while birds are alone [41]. Interestingly, ‘talking to oneself’ is one of the traits that has been stated to be ‘human unique’ (e.g. ‘It seems likely that these private soliloquizing, praying or talking to oneself are uniquely human activities that evolved on top of prior purely social communicative form of language’) [42]. In fact, not only songbirds sing when they are alone, parrots and chimps that have been taught to imitate human words, vocally or as signs, also do not limit their use to a communicative context [43,44].

A complementary approach to organizing animal vocalizations by behavioural context is to sort them by bioacoustic features. Calls are mostly short utterances, often with variable temporal sequences that occur usually in specific circumstances such as begging calls, alarm calls, contact calls, food calls or distress calls. Songs are typically longer utterances with more stereotypically ordered temporal sequences, frequently associated with territorial defence or reproduction, including pair cohesion and mate attraction. Songs occurring in the aforementioned contexts have been described, for instance, for birds [45], mice [46,47], bats [4850], whales [51,52] and gibbons [5355].

‘Learning’ of human language and birdsong is often used synonymously with the terms ‘vocal learning’ or ‘production learning’, which refer to the change in sound signal as a result of experience [56]. For instance, the first uttered words emerge from non-verbal infant babbling as a result of infants' experience with adult language. This type of learning has only been documented in a few species, among them all songbird species studied [45,57], various species of parrots [45,58,59], Anna hummingbirds [60], sack-winged bats [61], a harbour seal [62] and two elephants [63]. In contrast, many animals can associate an existing sound signal with a new context (referred to as ‘contextual’ or ‘auditory’ learning) [56]. For instance, a young vervet monkey can learn to associate hearing a particular alarm call with the presence of a particular predator (‘comprehension learning’) and it can even learn to produce a particular innately specified call in a particular predator context, as a result of observing conspecifics (‘usage learning’) [64,65]. Contextual auditory learning is common in animals, whereas vocal production learning is not.

Both language and birdsong are best learned during a ‘period of opportunity’ also called ‘sensitive period’ during early development [66].

3. Deconstructing language into biologically tractable units

In broadest terms, language can be divided into a conceptual–intentional system that deals with thoughts and meaning, and a sensorimotor system that deals with the acoustic analysis of speech sounds and their production [32]. Of course, this is an oversimplification and those two systems feed back on each other, the same way the brain influences behaviour and behaviour influences the brain.

Which features of language can be analysed in animal models? In the 1960s, some linguists considered human language and animal communication to be so categorically different that they were essentially incomparable [30]. Hockett postulated a continuum of complexity among animal communication systems, including human language [67]. Those positions still mark the two ends of the spectrum, but concomitant with the emergence of biolinguistics as a research field [6870], the abyss between the two camps is slowly being bridged [14,32,33]. The present review is a contribution to this biolinguistic perspective. Reviewing the literature on the link between FOXP2 and language in people and its role in birdsong and other animal models will lead us to propose, in §4, that this transcription factor and its associated molecular network may constitute one of the constraints that channel evolutionary patterns towards similar outcomes, e.g. learned vocal communication in diverse taxa.

But before reviewing the evidence, we will spend a moment challenging the assertion in most if not all writings on the subject that animal studies can solely contribute to the sensorimotor ‘speech’ aspects of language, whereas syntax and particularly semantics, the language domains that allow the externalization of the conceptual–intentional system, do not exist in animal communication. Since animals are increasingly recognized as having concepts and intentions [7173], we think it is useful to re-examine the reasons why animals are assumed not to communicate about those. Below, we will therefore review different aspects of language and which of those are known to exist in animals (§3(a)) and lay out the advantages of a comparative approach to language evolution (§3(b)).

(a) Conceptional–intentional processes in animal communication?

Hockett [67] proposed that human language is characterized by the sum of a set of ‘design features’, some of which he recognized as present in animals. Among those were auditory properties of sound, and the fact that communication systems have senders and receivers. Others he assumed to be exclusive to language. We will examine the evidence for their unique status one by one.

Traditional transmission’, e.g. the social learning of language, Hockett considered to be uniquely human. However, as mentioned above, a number of animals are also capable of socially mediated vocal imitation, a fact already recognized by Aristotle [19].

Duality of patterning’ refers to the ability to create from a finite set of sounds an infinite set of words and from these an infinite set of sentences. This is also regarded to be unique to human language. Two arguments against duality of patterning in animal vocal communication have been put forward. If an animal has a small sound repertoire, there are only a limited number of combinations possible, even though it is also clear that theoretically even very small (and not necessarily imitatively learned) repertoires can code large amounts of information (e.g. binary computer code, or 4-letter DNA ‘alphabet’). In animals with large repertoires, the potential limitation of combinatorial coding space is even less of an issue. However, bird song researchers have compared the number of distinct sound elements in the songs of different species with the number of combinations in which they occur. In contrast to the situation with language, in all investigated bird species there are more song elements in the repertoire than combinations of those elements [74], which argues at first sight against the existence of duality of patterning. In addition, in many bird species, strings of song elements that occur together usually do so in a hierarchically structured, mostly unidirectional way, again, in contrast to language. This is also the case for instance for free-tailed bat song [49]. However, if one considers the size of the ‘combinatorial unit’ not to be the smallest song element (also called a note, a syllable or an element in bird song literature) but the next larger unit of song, e.g. a string of ordered elements occurring together (called song type, or motif or phrase in the literature), the situation is different. Song types in some species with large repertoires occur in long, non-random and non-unidirectional arranged sequences during song bouts that last from minutes to many hours [7578]. At this organizational level, there is much more room for complex sequential rules that could be rich enough to carry semantic information. In fact, song type (or motif) order in many birds and some bats [49] can be much more dynamic than note order within song types.

If animals were indeed using such sequences in a combinatorial way to convey semantic content (hypothetically speaking), then animals would have to integrate information conveyed over longer song sequences, which requires sufficient auditory memory capacity. Working memory for auditory sequences in at least one songbird, the starling, is in the same range as that of humans [79], and starlings may use similar strategies as humans to store and retrieve serially ordered auditory communication signals [80]. In addition, for items other than song, birds are known to have large memory capacities; for instance food-caching birds can memorize the location of hundreds of stored food items over a period of several days [81]. In addition, female birds of various species visit 10 or more mating partners before settling on one. In the well-documented case of satin bowerbirds, this search can take weeks during which females must remember features of those potential mates [82]. Thus, in principle, birds can hold many items in memory over varying periods of time.

In human language, grammatical rules structure the plethora of combinatorial possibilities. For instance, an English speaker knows that a sentence starting with ‘if’ will be followed by a ‘then’ (implicit or explicitly stated), but the number of words in between can vary. The causality, however, will be completely clear. Likewise, the difference between the meaning of ‘Anna laughed at Ben’ and ‘Ben laughed at Anna’ being coded by syntax is a feature that is generally assumed not to exist in animal vocalizations. There is no known parallel in animal communication, but to our knowledge, the idea that combinatorial signalling carries semantic information at the level beyond song syllable combinations has not been formally explored. It would require analysing whether particular strings of song types sung in particular sequences correlate with different situations or behaviours. These are questions that are hard, but not impossible, to address experimentally.

Productivity’, the ability to create new utterances by combining existing utterances, Hockett also considered to be uniquely human. For example, ‘enter’ can be combined into ‘entertain’ or ‘enterprise’, meaning different things. Adding an ‘im’ to ‘patient’ reverses the meaning. To our knowledge, few studies have addressed whether different combinations of calls or song elements could have different meaning. Where it has been investigated, for instance in Campbell's monkeys, it was found that they can combine calls in such a way that indeed the meaning can change [83].

Semanticity’, e.g. the fact that language is about things and can express arbitrary thoughts is usually considered to be exclusive to humans. Two quotes exemplify this ‘… the elements of birdsong or whale or gibbon song are not put together by the animals in such a way that the whole song conveys some complex message assembled from the meaning of the parts’ [42] or ‘If there are nonhuman species with open-ended semantics, they are remarkably clever at hiding these abilities from generations of dedicated ethologists’ [84]. In fact, generations of dedicated ethologists have not addressed ‘the meaning of parts’ of song but are instead operating under the assumption that animals communicate mainly about ‘fighting’ or ‘flirting’ in a non-semantic way. What experiments have addressed song semanticity? The fact that a playback of a mere song fragment can be sufficient to elicit an agonistic response from a territory holder may not be any more indicative of song syntax and semantics than the fact that the mere ‘you stupid?’ can be sufficient to start a brawl at a bar. Besides, bees communicate ‘about’ food [85], many animals use calls that refer to different predators [86,87] and grey parrots can learn to use sounds to refer to objects [44]. Some great apes can also learn to use signs or sounds to refer to objects [88,89]. None of these examples capture all aspects of semanticity in the human language, but they indicate that animals are in principle able to use arbitrary vocal or other gestures associated with meaning. Whether they do this as extensively as humans do, what subjects other than food and predators they might communicate about, and what role syntax plays should be the subject of (difficult but not impossible) investigations. The absence of evidence should not be taken for evidence of absence.

Displacement’, the ability to refer to absent events, things or concepts is another design feature considered to be restricted to humans [90]. Again, what is the evidence for communication about future events not occurring in animals? What experiments have refuted this? Clearly, negative evidence is never completely satisfying to rule out the existence of something, but we think those experiments haven't even been done yet. In fact, data from Nicky Clayton et al. are consistent with the possibility that birds in the corvid family act with an eye to future events, an interpretation that is controversially discussed [9193]. If indeed animals are aware of the past and the future, they could in principle also communicate about them. In sum, we feel that there is insufficient evidence to conclude that vocalizations that externalize concepts, facts and intentions are necessarily an exclusively human domain. Needless to say, animal ‘thoughts’ may differ in fundamental ways from ours, which makes empirical research on this topic challenging.

In conclusion, we do not argue against the fact that the sum of Hockett's design features characterizes human language. We also do not expect animals to have a communication system that mirrors human language in all aspects. However, if all of Hockett's design features existed in some version in animals, it would help to shift the debate about the evolution of language from the categorical (‘language has unique attributes that do not exist in animals’) to the graded (‘different attributes of languages exist in principle in other species, to varying degrees and with potentially different consequences’). Adopting the latter standpoint is bound to lead to new experimental impulses and less conjecture [9496].

(b) The comparative approach

Deconstructing the complexity of language is a first step towards the study of brain and gene networks involved in these constituents in diverse non-human species. When molecular biologists first unravelled the mechanics of transcription and translation in bacteria, many scientists were sceptical about whether the emerging principles were applicable to eukaryotes as well. Even though the existence of general biological principles at this level is no longer surprising, the findings that the Pax6 transcription factor and its target genes play a central role in the formation of eyes across the entire phylogenetic tree was considered astonishing news, because the morphology of eyes in different taxa had clearly evolved in different ways and not from a common ancestor eye [97]. As more evidence for conserved molecular toolkits emerges, for instance, for learning and memory in flies, slugs and mice [98], one wonders whether conserved molecular networks may also apply to learned vocal behaviour? Birds offer a great opportunity to address this question. There are hundreds of different species of songbirds, in addition to two other orders of birds, hummingbirds [60] and parrots [58], that are known to learn their vocalizations by imitation. In contrast, there is only one human species, speaking more than 4000 languages. Comparing languages within our own species, the domain of linguists, has provided data that are suggestive of both common principles, shared by all languages, and specific parameters for each [30]. Even though the behavioural strategies and neural systems mediating song in birds also have common principles, parameters differ across species [45]. We should not forget that there is usually much more genetic variety from species to species than within one. Talking about ‘birds’ in general when comparing song to language can by definition only capture the common principles of song, and ignores the potentially interesting variations in parameters, some of which may be more comparable with language than others.

Returning to genes and molecules, looking for bird song principles and parameters may provide inroads into previously unrecognized structural determinants. These in turn could reflect common molecular ‘deep homologies’, as well as associated cellular and neural substrates. The same comparative approach is possible in mammals, for instance by tapping into the species diversity of bats with different social vocalizations [4850,99]. A comparative approach is already very successfully being employed to study common genetic mechanisms in the evolution of social behaviour in invertebrates [100] and vertebrates [101].

4. Foxp2: hype and hope

FOXP2 has captured the imagination of scientists and laymen alike because it was the first gene causally related to a fairly specific speech and language phenotype, called developmental verbal dyspraxia (DVD; alternatively called childhood apraxia of speech, CAS, American Speech-Language-Hearing Association, 2007) [102]. DVD's core symptoms include inaccurate and incomplete pronunciation of words, difficulties in repeating multi-syllable nonsense words and impaired receptive speech [103105]. Notably, FOXP2 belongs to a group of genes for which multiple studies have found clear evidence for positive selection in the hominin lineage [106,107].

The link between the transcription factor FOXP2 and language was first recognized in the large KE family spanning three generations. About half of the members of this family suffer from an autosomal dominant speech and language disorder, which was shown to be due to a heterozygous point mutation in FOXP2, inherited by all the affected, but none of the unaffected individuals [102]. Similar speech and language phenotypes exist in unrelated individuals with different FOXP2 mutations [108]. The original hype in the popular press that touted the gene as ‘THE language gene’ was replaced by a more differentiated picture about its role in language. This is based on findings from in vitro and in vivo studies, including animal studies, which have made considerable progress in addressing the molecular and neural function of FoxP2 in different species [109,110]. We will review the evidence for the relevance of FoxP2 for neural development and for neural and behavioural plasticity in postnatal life. The hope is that with increasing information about the molecular regulatory networks that FoxP2 participates in, and with more information about its function in different animal vertebrate and invertebrate model systems, we may not only learn whether FoxP2 is indeed another case of ‘deep homology’ [84,111], but it may illuminate how speech and language work mechanistically and how they evolved.

(a) Gene structure, molecular upstream regulators and target genes

FoxP2 is a member of the winged helix transcription factor family, characterized by a highly conserved Forkhead (Fox) domain that binds to distinct DNA sequences in its target gene's regulatory regions. It can act both as a transcriptional repressor as well as an activator of downstream genes [112,113]. A C2H2 type zinc finger may be relevant to DNA and protein interactions and a leucine zipper domain is required for the homodimerization and heterodimerization with two other FoxP family members—FoxP1 and FoxP4. To act as a transcriptional repressor, dimerization of FoxP2 is essential, at least in reporter-gene cell culture assays [114,115]. FoxP2 also contains a glutamine-repeat region within the N-terminal part of the protein akin to those occurring in poly-Q repeat disorders, such as Huntington's disease and spinocerebellar ataxia [116], but in the case of FOXP2, the CAG repeat has not been linked to the speech pathology.

As the name transcription factor implies, FoxP2 regulates the transcription of other genes. Which genes those are is an interesting question for two reasons. On the one hand, identification of these so-called ‘downstream targets’ can help to pinpoint the cellular functions regulated by FoxP2 in a particular species. On the other hand, comparing and contrasting FoxP2 targets in non-human animals with those in humans could provide important cues for unravelling how, during the course of evolution, potential functional changes occurred that might have contributed to the emergence of speech and language. Recently, direct neural targets of FOXP2 were identified in a human neuronal cell line [113] and in various human embryonic tissues [117]. Both studies used chromatin-immunoprecipitation with antibodies that recognize FOXP2 in combination with human promoter microarrays (ChIP-chip). The main set of candidate genes from these studies are proposed to play a role in neurodevelopment and neurotransmission, for example neurite outgrowth and synaptic plasticity, predicting a possible disturbance of these pathways involved in the speech and language disorder of patients with FOXP2 mutations [113,117]. FOXP2 ChIP-chip of human foetal brain tissue at 16–20 weeks of gestation revealed 84 specific target genes in the basal ganglia (BG) and 83 specific targets in the inferior frontal cortex. These sets differed markedly from those in human lung. The identity of the majority of genes suggests specific neural functions of FOXP2 in regulatory networks for cell communication and signal transduction. This points towards a prenatal FOXP2 function during central nervous system (CNS) development. Of the 285 proposed targets, 14 are predicted to be also under positive selection in humans.

In a study comparing gene expression in chimps, rhesus macaques and humans, Caceres et al. [118] found that of the differentially expressed genes, around 90 per cent were expressed at higher levels in the human brain, but did not differ in liver and heart. Of the FOXP2 target genes that Spiteri et al. [117] identified, 47 genes also belonged to the set that was expressed differentially in human and chimpanzee brains [117]. Among those were genes relevant for CNS development and neural transmission, suggesting a potential role of some FOXP2 target genes in human-specific traits. Konopka et al. [119] explored whether the human form of FOXP2 may have different functions than the chimpanzee version of FoxP2 by expressing chimpanzee or human FoxP2 genes in human neuronal cell lines. Subsequent microarray analysis showed that gene expression levels of 116 genes differed quantitatively. As this set of genes included genes active in pathways and tissues relevant for speech and language, the authors speculated that the human version of FOXP2 might have contributed to the evolution of this trait.

The differentially regulated sets are enriched for genes involved in transcription, cell–cell signalling, protein and cell regulation. The authors find key players of brain development and function, e.g. DLX5 and SYT4, among the highly connected genes when applying network analysis and observe other relevant candidates that seem to be co-regulated with some of the differentially expressed putative targets [119]. One of the downstream targets of FOXP2, CNTNAP2, recently received special attention. Particular single nucleotide polymorphisms are associated with core deficits of children with specific language impairment and also coincide with language delays in autistic children [120]. In songbirds, CNTNAP2 is differentially expressed in some song control nuclei, but whether FoxP2 regulates CNTNAP2 in songbirds has not yet been addressed [121]. These findings are encouraging in the light of potential deep homologies between human speech and bird song.

To understand the role of FOXP2 for cellular and behavioural function and how this might have changed during the course of evolution, one also needs to identify how the transcription of FoxP2 is regulated. In fact, Carroll has argued that changes in gene regulation via non-coding sequences, including transcriptional cis-regulatory elements, the untranslated regions of messenger RNAs and RNA-splicing signals are a more important force than coding-sequence evolution in the morphological and behavioural evolution of hominins [122]. Two putative upstream transcriptional regulators of FoxP2 have been described, one in zebrafish (Danio rerio) [123] and one in zebra finch (Taenigpygia guttata) [124]. Lef1 is a transcription factor that is activated via the Wnt/β-catenin signalling pathway. Its decreased expression or inhibition during the development of the zebra fish brain leads to decreased FoxP2 expression in particular brain areas. Furthermore, enhancer sites within the FoxP2 genomic DNA sequence show several Lef1 binding sites and two of these enhancers bind Lef1 in chromatin immunoprecipitation analysis in zebra fish. In juvenile zebra finches, administration of a cannabinoid agonist increases FoxP2 expression in the striatum, persisting into adulthood. Whether the influence of cannabinoid signalling on FoxP2 expression in brain regions relevant for learning and practising song is due to direct interactions needs to be further studied.

To recap, many potential FOXP2 target genes have been identified, including CNTNAP2 which also has been linked to language impairments [120]. Which target genes operate in which neurons remains to be discovered and this will narrow down hypotheses about the function of FOXP2 in different tissue contexts. The discovery that hetero- and homodimerization are an essential feature of the FoxP protein family offers the opportunity for combinatorial fine-control of gene expression in a cell-type specific manner.

(b) FoxP2 expression studies during brain development and in postnatal brains

There are two general, but not mutually exclusive, possibilities about how FOXP2 affects language and speech in humans. The protein could be involved in the formation of speech-circuits or it could affect the process of speech learning, the perception and/or the production itself. Surveying the existing evidence, both seem likely (tables 1 and 2). FOXP2 expression is already present in human foetal brains at sites that develop into those structures that show morphological and functional abnormalities in patients with FOXP2 mutations [125]. This is consistent with FOXP2 playing a role in establishing speech relevant circuits prenatally. The expression patterns in human foetal brains are highly similar to those in mice of a comparable embryonic stage. FoxP2 is also expressed during embryonic development in homologous brain regions of monkeys, various rodent species, different bird species, frogs and fish [112,126128,136,140144]. These findings stress that FoxP2 is not exclusive to humans but likely to be relevant for the development of homologous brain regions in many vertebrates. A developmental role is underscored by the fact that homozygous Foxp2 mouse mutants (see below) develop cerebellar anomalies [130133].

View this table:
Table 1.

Developmental role of FoxP2 in vertebrate species. Many of the observations listed cannot clearly distinguish between only developmental or developmental and adult ‘online’ effects and are listed here only when observations were made during the development of the organism.

View this table:
Table 2.

Post-organizational role of FoxP2 in vertebrate species. Many of the observations listed cannot clearly distinguish between effects that result from adult, ‘online’ deficits or effects that manifested during the development and are listed here only, when observations were made during adulthood.

Continued expression of FoxP2 in postnatal and adult brains of different species suggests functions beyond developmental patterning. Foxp2 and its close homologue Foxp1 are expressed in different subpopulations of projection neurons in the cerebral cortex of young mice, and might therefore serve different functions during establishing and shaping of distinct cortical circuits in early postnatal stages [129]. In adult mice, Foxp2 continues to be expressed in layer 6 of the cortex, the striatum, dorsal thalamus, cerebellar deep nuclei and Purkinje cells, and the inferior olive in the medulla [126,128]. Except for the cortical expression, this pattern is also characteristic for a number of bird species and crocodiles [136]. In the striatum of mice and songbirds, FoxP2 is expressed by medium spiny neurons, co-localizing with DARPP32 and co-expressing dopamine D1 receptors [136,147]. Within the mammalian striatum, the striosomal and matrix compartments express neurochemical markers differentially and are characterized by distinct projection patterns [151]. In macaque, FoxP2 expression is enriched in the striosomal compartment [141]. Most attention so far has focused on the significance of FoxP2 in cortico-striatal and cortico-cerebellar networks because of their clear connection to speech. Focusing on other regions that express FoxP2 supports potential common cellular and behavioural roles. For instance, the inferior olive strongly expresses FoxP2 in all species studied, and is, like the striatum, important for the timing of motor behaviour [152,153]. In addition, the midbrain auditory inferior colliculus nucleus and its avian equivalent Mld, and the auditory thalamic medial geniculate nucleus (MGN) and its avian equivalent nucleus ovoidalis express FoxP2 [127,128,136].

Although the neural FoxP2 expression patterns are overall strikingly conserved among vertebrate species, some interesting heterochronic changes exist. For instance, FoxP2 expression dramatically declines in monkey brains postnatally and disappears first from the putamen and later from the caudate nucleus of the BG, whereas expression persists in homologous parts of the rat brain. Neural FoxP2 expression in mammals commences only after their last mitotic division, whereas in songbirds some dividing neuroblasts in the ventricular zone already express FoxP2 and this expression is maintained into adulthood, also differently than in adult mice, where the neurogenic zones do not express Foxp2 [137].

Postnatally, songbird FoxP2 expression is dynamically regulated in a BG structure called Area X. During development, songbirds acquire their species-specific and individual-specific song by imitating the sounds of adult conspecifics. To achieve this, Area X is required [154,155]. Once song is stably learned, Area X continues to be relevant for online monitoring of song [154156] and without Area X, normal song production deteriorates [157]. Juvenile male zebra finches consistently express 10–20% more FoxP2 mRNA within Area X than in the surrounding striatum during their song-learning phase. This change in FoxP2 expression is not related to immediately prior singing activity, because birds in this study had not sung before sacrifice [136]. Other regions involved in controlling the learning and production of song show very low FoxP2 expression [127,136]. A further correlation between song plasticity and levels of FoxP2 expression exists in another species of songbirds, canaries. During the breeding season, male canaries sing highly regular and stereotyped song and FoxP2 expression in Area X is low. After the breeding season, song becomes variable and new syllables are incorporated, and concomitantly FoxP2 in Area X is upregulated [136]. As in juvenile zebra finches, canaries did not sing prior to sacrifice, so the different FoxP2 mRNA levels cannot be explained by a direct link to singing activity. However, such an acute link between singing and FoxP2 expression has also been reported in zebra finches. When adult male zebra finches do not sing or sing undirected song (not directed towards another bird), FoxP2 mRNA is significantly higher in Area X than in the surrounding striatum [149]. In contrast, when adult birds sing female-directed courtship song [149], FoxP2 mRNA levels are lower than in the surrounding striatum. In contrast to mRNA levels, after 2 h of either directed or undirected singing, FoxP2 protein levels in Area X (normalized to glyceraldehyde-3-phosphate dehydrogenase (GAPDH)) are lower than in a non-singing group [150]. The interpretation of the difference between mRNA and protein levels could be owing to interesting regulatory mechanisms, but is currently difficult to evaluate, because the variability of the protein data within the groups is quite high and it is unclear whether GAPDH is suitable for normalization in avian Area X [158]. In addition, the most consistent and strongest difference in FoxP2 expression is an increase that occurs in non-singing birds during the first 2 h of daylight, before they start singing [150]. During the late phase of song development, 75 days after hatching, FoxP2 mRNA levels, similar to brains of adult birds, decrease after 2 h of undirected singing in Area X of hearing as well as in deafened birds. Both constitutive and singing-induced changes in FoxP2 levels in Area X might thus be tied to song plasticity during development as well as in adulthood [136,138]. Whether FoxP2 functions cellularly as a positive or negative regulator awaits further study. Interestingly, 75 day-old birds, whose song imitation is already quite good, show a higher song and song sequence variability after 2 h of singing than after 2 h of silence [159]. Curiously, in 50–57 day-old birds, the ongoing improvement of the juvenile birds' ability to imitate the adult song of a tutor increases during each day, but decreases slightly over night before it improves again the next day [160]. Whether the singing-induced variability noted by Miller et al. [159] relates to the daily improvement of imitation noted by Deregnaucourt [160] will be interesting to pursue.

Several other genes are specifically regulated at the mRNA level in zebra finch Area X by undirected singing behaviour [161]. For example, two histone family members (H2AFX; H3f3B), two heat-shock proteins (Hsp25; Hsp90α) and calcyclin-binding protein (Cacybp) are upregulated, whereas ARHGEF9 [161] is downregulated. Mutations of the latter gene are involved in human mental retardation and sensory hyperarousal [162,163].

Interestingly, FoxP2 is also expressed in the song circuitry in species of two other avian orders, hummingbirds and parakeets, both vocal learners. Given that these three orders are not linked by an immediate common ancestor [164], it seems parsimonious that songbirds, parrots and hummingbirds evolved the neural circuitry for vocal learning independently. The alternative hypothesis, that a common ancestor to all extant birds possessed this trait that was subsequently lost in non-vocal learners (to varying degrees), is however also possible [165]. Considering both scenarios, one should bear in mind that only a few species have unequivocally been shown not to exhibit vocal learning [166] and there may be so far unrecognized intermediate phenotypes between accurate imitative ‘production’ learning and ‘usage’ learning [56]. If so, the existence of neural structures similar to those in hummingbirds, parakeets and songbirds should be checked in those species, as well as FoxP2 expression patterns. Together, these kinds of experiments are necessary to determine how universal deep molecular homologies relating to the neurobiology and the behaviour of learned motor behaviours really are.

In mice, activity-driven Foxp2 expression has also recently been noted in the MGN, the principal auditory relay nucleus of the mouse thalamus. Foxp2 is strongly expressed after white noise stimulation of young mice [135]. Whether activity-driven FoxP2 expression plays a role in expression differences, such as the developmental downregulation in monkey striatum should be taken into account in future expression studies.

In insects, only one FoxP family member exists [167,168], indicating that the four FoxP paralogues in vertebrates arose via gene duplication. The functional role of FoxP in Drosophila is just starting to be investigated [169] and together with the information about its target genes may provide rich insights into deeper levels of molecular homologies for similar cellular and behavioural functions. In honeybees (Apis mellifera), AmFoxP mRNA is expressed at higher levels in the brains of worker bees around 4 days after eclosion, when bees become foragers, searching for food and communicating information to their hive members via dancing. Unfortunately, whether or not AmFoxP is also increasing in drones that leave the hive, but do not dance, is not known. Division of labour or caste does not seem to have any influence on AmFoxP expression levels in the bee brain. One day after eclosion, foragers express AmFoxP in cell bodies clustered in the optic, protocerebral and dorsal lobes. The latter is suggested to play a role in the processing of mechanosensory information, but without knowledge of the projection patterns of the AmFoxP expression neurons, it is currently difficult to make predictions about their function [168].

In summary, heterochrony of FoxP2 expression in different bird species exists with respect to development, season and circadian rhythm and heterometry occurs as a result of acute singing activity and social context. How these are regulated is one of the challenges for the future (figure 1). Altered amounts of expression during song learning in birds and as a result of auditory stimulation in mice are compatible with a role of FoxP2 in auditory processing, vocal motor behaviour or the integration of both. Evoked brainstem responses after auditory stimulation in mouse mutants (see below) further support this idea.

Figure 1.

Schematic depiction of the levels at which differential regulation of FoxP2 and its target genes can affect cognitive, motor and peripheral functions in different organs and species. FoxP2 is controlled by a so far mostly unknown set of upstream regulators. It regulates a large set of target genes, which are known for only a very limited number of species, cell types and developmental stages. FoxP2 acts in cortical, subcortical and peripheral areas of different vertebrate species. The future challenge is to discover how the similarities and differences in the amount, space and time of FoxP2 expression in different species is regulated, and how this in turn affects its downstream targets and eventually their behaviour.

(c) Functional analysis of Foxp2 in mutant mice

Foxp2 mouse models include knock-out mice [130,131,133], mice that carry the aetiological mutations that mirror those of the KE family or other patients [132] and mice in which the murine Foxp2 gene has been ‘humanized’ by including the two human-specific amino acids [134]. Heterozygote mice carrying a Foxp2 null-allele or the aetiological mutation show a reduction of Foxp2 protein by 50 per cent and thus mirror the haploinsufficiency of patients with FOXP2 mutations [102]. Homozygous FOXP2 mutations in humans have not been reported, presumably because they are lethal. Likewise, mice carrying two alleles with aetiological or loss-of-function Foxp2 mutations die a few weeks after birth [130,131,133]. In some studies, these heterozygous mice have subtle abnormalities in brain morphology as well as mild developmental delays [130,133], while other investigations reported that heterozygotes are macroscopically normal without such abnormalities [131,132]. Differences in these observations may relate to modifier effects in the genetic background, given that some of these investigations did not perform back-crossing of the mutant mice [109].

Altered auditory brainstem processing was observed in mutant mice (Foxp2-R552H) that carry the equivalent of the human aetiological R553H point mutation in the forkhead domain, but not in mice with a Foxp2 mutation that mirrors the human aetiological R328X mutation that truncates the protein before the known functional DNA binding and protein interaction domains [148]. These results may reflect preserved dimerization capacity of the R553H, but not the R328X mutation. These two mutant proteins also differ in their intracellular localization when expressed transiently in human neuronal-like cell lines [170]. How much the interactions with other Foxp proteins contribute to these effects is not known, but both questions are accessible to further studies.

Analysis of vocalizations in Foxp2 mutant mice has so far been limited to pup vocalizations, which are thought to be innate. A reduced Foxp2 dosage does not prevent pups from producing audible calls, ultrasounds and clicks [132] even though KO and R552H-KE mutants tested under less stressful conditions did not produce any ultrasonic isolation calls [130,133]. Heterozygous mouse models, either carrying the Foxp2-R552H missense or the Foxp2-S321X nonsense mutation, produce all sound types and do not show any significant differences in the temporal domain that are not routinely observed for different mouse strains [171]. Whether adult ultrasound mouse courtship song [46] and its neural substrates differ in Foxp2 mutants and wild-type mice is not known, nor is the answer to the question of whether mouse song is learned via imitation-like speech and birdsong. Four muroid species with different levels of complex vocal output do not show any species-specific Foxp2 expression patterns that can be related to their different abilities to vocalize [172], but the latter study presents interesting incidences of heterotopic expression of Foxp2 in some parts of cortex and amygdala in the different murine species studied. In the absence of more specific information about mouse vocalization pathways and a potential role of Foxp2 in those, it is interesting that Groszer et al. [132] describe significant behavioural and physiological differences of R552H heterozygote mice when compared with littermates. Mutants with reduced Foxp2 dosage perform worse on the tilted voluntary running-wheel system and on accelerated rotarods, consistent with impaired motor skill learning and affected frontostriatal and/or frontocerebellar circuitry. They also show impaired long-term depression (a form of altered synaptic plasticity) at glutamatergic synapses of the dorsolateral striatum. Cerebellar synaptic plasticity also subtly differs, although the circuitry was found to be grossly normal. Consistent with a role of FoxP2 in the olivo-cerebellar [173,174] and cortico-striatal system [153] are also findings that affected KE family members perceive rhythmic differences in stimuli less well and imitate rhythms less accurately than control subjects [146]. The bottom line of the studies in mice is that reduced levels of Foxp2 protein, equivalent to the amounts of cellular protein in patients carrying heterozygote human FOXP2 mutations, affect neuronal plasticity in the striatum and motor learning but have less of an effect on vocalizations.

(d) Functional analysis of FoxP2 in songbirds

Birdsong, just like human speech and language, has to be learned by integrating auditory input with vocal motor output through practice. Searching for a causal relationship between FoxP2 and learned complex vocalizations, FoxP2 was knocked-down using lentivirally mediated RNA-interference during the sensorimotor learning phase in the striatal BG nucleus Area X of male zebra finches [139]. Birds with reduced FoxP2 levels in Area X did not learn a complete copy of their tutor's song, omitting several of the tutor's syllables. Furthermore, they copied the imitated syllables less accurately than did the control birds. In contrast to their control siblings that expressed normal FoxP2 levels, they also sang song more variably from rendition to rendition [139]. This production variability also characterizes people with FOXP2 mutations [145]. The relative contribution of sensory, motor or sensorimotor integration to the impairments is difficult to dissect unambiguously with the current animal models. However, a number of findings suggests that the deficits resulting from FoxP2 knockdown are not restricted to motor performance; imitation success differs significantly, already during the learning phase, between FoxP2 knockdown and control zebra finches and then plateaus early in the knockdown birds, whereas variability of song element production (‘syllables’) continues to increase in knockdown finches when it no longer does in controls [139]. A more definite dissection of a sensory role for FoxP2 in the BG would require conditional knockdown of FoxP2 during the purely auditory learning phase in bird species with a sensory memorization phase that precedes motor practice by months, such as canaries, reactivating FoxP2 expression during the time when vocal motor practice ensues.

Do these findings allow us to make any predictions about the role of FoxP2 at the neural level? The anterior forebrain pathway of song learning birds echoes the mammalian cortico-BG-thalamo-cortical loops, but important differences exist. Like the striatum in mammals, striatal Area X in songbirds receives cortical glutamatergic afferents that synapse onto spiny neurons with histochemical and electrophysiological features strongly resembling those of mammalian medium spiny neurons. The songbird cortical input to the spiny neurons of Area X is also modulated presynaptically by midbrain dopaminergic input. However, Area X also contains aspiny, tonically active, fast-firing GABAergic neurons similar to mammalian pallidal neurons [175]. Recording from Area X in singing birds, Goldberg et al. [176] recently identified two types of these neurons, differing in connectivity and firing pattern, in a very similar way as do the two different pallidal neuron types in primates. Importantly, Area X within the songbird striatum has slightly different connectivity patterns than the surrounding striatum [177]. These differences could reflect the small evolutionary modifications postulated for new traits, such as avian vocal learning.

What are the cellular consequences of knocking down FoxP2 in Area X spiny neurons? Schulz et al. [178] found that spiny neurons with knocked-down FoxP2 levels had fewer spines than control-injected neurons. This effect was even more pronounced when neurons received the knockdown before differentiation, i.e. as neuroblasts in the ventricular zone, where adult neurogenesis takes place.

To sum up, using gene manipulation in striatal Area X of young zebra finches during their song-learning period caused song impairments that phenotypically echo aspects of developmental verbal dyspraxia in humans. Like patients with FOXP2 mutations, birds with reduced levels of FoxP2 do not develop their full articulatory potential and produce their smaller set of vocal elements more variably than is typical. The fact that FoxP2 knockdown in adult Area X affects the structural plasticity of dendritic spines nicely complements the data from the mouse models demonstrating altered synaptic plasticity [132,134,179].

(e) Evolution of FoxP2

The evolution of FoxP2 is enigmatic. On the one hand, the DNA sequences for most of the species studied differ by less than 10 per cent. This, together with the similar brain expression patterns in vertebrates, suggests that the protein fulfils an evolutionarily ancient and important role, not limited to humans. On the other hand, there is clear evidence that the human FOXP2 gene was subject to a selective sweep during hominin evolution [106,180], as was a subset of putative FOXP2 target genes [117]. These findings, together with the gene's role in language, have led to the proposal that the human FOXP2 gene is one of the things that ‘helped us become human’.

What, then, is the nature of the differences between the non-human and human FOXP2 genes? Only two amino acid changes distinguish the protein sequence of chimps and humans, not counting the extra glutamine within the polyglutamine repeat in the chimpanzee FoxP2. Neither of the two amino acids lies within the characterized protein functional domains, so their molecular function is unclear. Do these differences make any contribution to the fact that human speak and chimpanzees do not? Testing this idea, Enard et al. [134] introduced the two human-specific amino acids into the murine Foxp2 locus. Interestingly, mice pups that carry a partially humanized version of FOXP2 (because only the coding changes were introduced) had isolation calls that differed bioacoustically from those of control mice. In addition, the mice carrying the partially humanized version of FOXP2 showed less exploratory behaviour, altered synaptic plasticity of the striatal medium spiny neurons, lower dopamine levels in five brain regions including the frontal cortex and the caudate-putamen, and had longer dendrites in culture [134]. In a follow-up study, the humanized version of FOXP2 introduced into mice was shown to specifically affect the cortico-BG circuits, but not the cortico-cerebellar circuits [179]. These findings are consistent with a different function of the human FOXP2 from the wild-type mouse version, but it is unclear whether the chimpanzee FoxP2 would also behave differently in a mouse background, and if so, how.

The fact that one of the human-specific amino acids also occurs in carnivores questions the uniqueness of the link between the two human-specific amino acids and the human language capacity. More importantly, data from a recent analysis of genomic regions up- and downstream of the region coding for the two human-specific amino acids raise the possibility that the selective sweep was in fact not associated with the two human-specific amino acid substitutions [181]. Furthermore, the original estimate for the emergence of the human-specific FOXP2 sequence was dated around 260 000 years ago and therefore concomitant with the emergence of cultural artefacts that are thought to be related to the evolution of language. However, recent data show that Neanderthals already possessed the human-like FOXP2 version, even though their lineage diverged from the one leading to modern man approximately 300 000–400 000 years ago, before language was so far thought to have arisen [182]. Alternatively, there may have been gene flow between Neanderthals and humans or contamination of the Neanderthal samples with modern human DNA [183].

A final interesting, but currently not understood, finding concerns bats, which show a much greater sequence divergence of FoxP2 than all other vertebrates studied [184]. While it is tempting to speculate that this diversity might be related to echolocation or different vocal behaviours that bats exhibit [61,185,186], comparing FoxP2 sequences in other animals that learn their vocalizations and those that do not did not turn up sequence variants that segregate with vocal learning [187,188].

One of the great challenges of characterizing FoxP2's function and its evolutionary contribution to vocal learning is to understand the dynamics of the gene's expression in different species. There are a number of questions that need to be answered. (i) How do non-coding DNA sequence changes affect where and when FoxP2 is expressed in different species; (ii) how do coding changes affect the structure of the protein and its interaction with other proteins and the DNA; (iii) what is the role of the differential expression of proteins that interact with FoxP2; (iv) how does FoxP2 expression respond to internal and external signals? All of the factors could be important sources of evolutionary change. Songbirds are a fruitful model system to explore heterochrony, since age-related differences, seasonal changes and differences co-occurring with singing style exist in Area X [127,136,138,149]. It will be interesting to follow those studies in different species of songbirds that vary in the timing of song learning, who sing during behavioural contexts for which FoxP2 expression has not been tested yet, and who show differences in adult song plasticity. In addition, more efforts should be directed at identifying the genomic loci regulating temporal expression differences of FoxP2.

5. Conclusions

Changes in the regulatory regions of genes can alter the timing, the amount or the place of gene expression in the course of evolution [6]. Likewise, changes in the protein-coding sequence can bring about altered gene products, leading to different functions [7]. In turn, both can result in changes to neural circuitry, as amply attested by differences in neuroanatomy among different species. Whether FoxP2 played a role in bringing about circuit changes that facilitated the emergence of human language is not clear. Cell lineage analyses and studies of neural microcircuitry in the mouse carrying the humanized FOXP2 allele could be a step in the right direction, but as a complex behaviour like language is bound to be a polygenic trait, other genes that presumably need to act together with FOXP2 might not be present or active in this mouse model.

Alternatively, one could imagine that other genes brought about the circuit changes required for vocal learning. Subsequently, FoxP2, which already functioned in the precursor circuits, then either acquired new importance because it operated in a new environment, or the gene also changed and altered its function. In the case of songbirds, song nuclei are embedded in regions that are active during stereotyped motor behaviours like hopping and walking [189]. In this evolutionary scenario, the expression of FoxP2 in Area X of the striatum thus became useful for sensory motor integration or precise timing of vocal gestures as supposed to other motor learning tasks in adjacent non-vocal circuitry cells. During the evolution of vocal learners, once the striatum got connected to other regions necessary for vocal learning to occur, FOXP2 mutated in humans to become human specific and this might have affected neural transmission. This would be a two-hit scenario of FOXP2's role in language evolution, circuit changes predating gene function changes. From the postnatal studies in mouse and bird, it is clear that FoxP2 plays a role in neural plasticity of certain circuits. As homozygous mutations are lethal in mice, one can assume that Foxp2 also plays a role in development. Whether this is true for brain circuits that are relevant for vocal learning in humans and birds is not clear.

To reiterate, both human speech and some animal vocalizations, such as the song of many bird species, are a form of auditory-guided vocal motor learning. Research into the underlying neural and genetic substrates of vocal learning in humans and in animal models is starting to reveal similarities and differences. Whether vocal learning is really the hallmark of a selected few species and presents a case of parallel evolution, or whether it is an ancient trait that exists to some degree in many more species that have not been stringently tested should be further explored. Each species exhibits adaptations that are a mix of traits it shares with others and some that may be unique. Depending on one's point of view about the position of humans in the universe, one finds the aspects we share with animals or those that set us apart more appealing for study. While we are certainly limited by our own species-specific cognitive biases, maintaining as agnostic a stance about what animals can and cannot do, until supported by experimental evidence, will be useful to unravel whatever precursors of language exist in our various animal relatives.


We thank Christopher K. Thompson, Anna Zychlinsky, Roland L. Knorr and particularly the two reviewers for very helpful comments and edit suggestions. Supported by funds from the SFB 665.



View Abstract