Human language is both highly diverse—different languages have different ways of achieving the same functional goals—and easily learnable. Any language allows its users to express virtually any thought they can conceptualize. These traits render human language unique in the biological world. Understanding the biological basis of language is thus both extremely challenging and fundamentally interesting. I review the literature on linguistic diversity and language universals, suggesting that an adequate notion of ‘formal universals’ provides a promising way to understand the facts of language acquisition, offering order in the face of the diversity of human languages. Formal universals are cross-linguistic generalizations, often of an abstract or implicational nature. They derive from cognitive capacities to perceive and process particular types of structures and biological constraints upon integration of the multiple systems involved in language. Such formal universals can be understood on the model of a general solution to a set of differential equations; each language is one particular solution. An explicit formal conception of human language that embraces both considerable diversity and underlying biological unity is possible, and fully compatible with modern evolutionary theory.
Because of its central role in human culture and cognition, language has long been a core concern in discussions about human evolution. Languages are learned and culturally transmitted over generations, and vary considerably between human cultures. But any normal child from any part of the world can, if exposed early enough, easily learn any language, suggesting a universal genetic basis for language acquisition. In contrast, chimpanzees, our nearest living relatives, are unable to acquire language in anything like its human form. This indicates some key components of the genetic basis for this human ability evolved in the last 5–6 Myr of human evolution, but went to fixation before the diaspora of humans out of Africa roughly 50 000 years ago. Darwin recognized a dual basis for language in biology and culture: ‘language is … not a true instinct, for every language has to be learnt. It differs, however, widely from all ordinary arts, for man has an instinctive tendency to speak, as we see in the babble of our young children; while no child has an instinctive tendency to brew, bake or write’ [1, p. 55].
Attempts to understand the diversity or the unity of human languages can select as their focus from among a variety of potential genetic, developmental and cultural/historical explanatory factors. As a result, the literature on human language universals is full of competing models and long-running arguments, spanning many disciplines including linguistics, evolutionary biology, anthropology, psychology and history. My goal in this review is to summarize and synthesize this often contentious literature from a biological viewpoint, surveying both abstract universals underlying human language and the considerable diversity of human languages.
My starting point will be the perspective on language developed by Darwin , in which all humans are born with an instinctual desire to learn language, and the neural equipment to do so. Darwin emphasized the aspects of human cognition shared with other animals, but he also recognized that certain aspects of our behaviour demand special explanation. Considering the biology of language, Darwin saw birdsong as the nearest animal analogue, because young songbirds must learn their song by listening to conspecifics. This leads to ‘dialect’ differences within a species, partly analogous to the diversity of languages. In modern terms, both birdsong and language are acquired via a specialized ‘instinct to learn’ . Despite a polarizing tendency among modern scholars to classify human language as either ‘learned’ or ‘innate’, a Darwinian perspective explicitly embraces both of these factors (cf. ).
My second core assumption is that the human capacity to acquire language is composed of multiple separable but interacting mechanisms, no one of which alone is adequate for language acquisition [4,5]. While some of these mechanisms may be unique to humans and to language (the subset termed ‘faculty of language in the narrow sense’ (FLN) by ), most of them will be shared in what we termed the ‘faculty of language in a broad sense’ (FLB). Clearly, this broad set of mechanisms, not the uniquely human subset, makes up the human ‘instinct to learn language’. It is irrelevant to the child acquiring language whether some component of its innate endowment is unique to our species, or shared broadly with other primates or vertebrates; what matters is that the capacity itself need not be learned, and thus provides a leg up during language acquisition.
If most of the mechanisms underlying human language are shared with other species or cognitive domains, why mention FLN at all? One reason is interdisciplinary: for many scholars, particularly linguists, the term ‘language’ connotes this ‘special’ subset of cognitive mechanisms, and FLN provides a moniker that is less apt to be misunderstood than language. Thus, statements about ‘language’ that might seem non-sensical, applied to FLB, may be perfectly reasonable if they concern FLN. Another important reason is cautionary: that subset of mechanisms that comprise the FLN will be the most resistant to comparative study, and their study will be particularly difficult and may demand different approaches than most aspects of human biology. But, as clearly stated by Fitch et al. , FLN is not the only, or even the most, important focus of biolinguistic research. This point will resurface repeatedly in the current paper.
A final set of assumptions incorporates some widely accepted observations from modern linguistics. First, although every child can learn their native language(s) with little or no explicit tuition, language acquisition is a supremely complex task . Despite five decades of research, and billions in funding, our most powerful computers are still not up to the task. Nor have linguists been able to create a complete and adequate grammar for any single language. The second observation is that every language can flexibly and creatively communicate thoughts between its speakers and listeners . Although languages vary considerably in the ways in which they do so, and in the complexity of different subcomponents of language, no language is in toto superior or ‘more complex’ than any other (possible exceptions include very young languages, such as creoles, but even here opinions are divided [8,9]). The persistent notion that some languages are ‘better than’ others, in one way or another, is today seen as a parochial myth. Third, a vast store of information in any human language must be learned (least controversially, every word of every language is learned), and thus contemporary debates concern not this fact, but whether a human child is born with a set of mechanisms or constraints that help this learning along [10,11]. No linguist believes that ‘language is innate’ in any simple superficial sense.
Beyond these basic facts, both the existence of language universals and their innate basis are highly controversial topics. Despite a long history of study (starting with ), even the existence of language universals has recently been termed a ‘myth’ . Although few modern commentators deny that the child's capacity to rapidly acquire its language(s) rests upon some genetic basis, debate rages over whether this genetically given endowment is specific to humans or specific to language (e.g. [4,14,15]) and whether it represents a specific adaptation for language or an unselected by-product of other factors such as constraints on brain development [16–20]. While many see the cultural evolution of individual languages as a route to understanding the biological basis for language acquisition , others see it as an argument against any evolved genetic basis . Still others see cultural change as demanding new paradigms for thinking about language as an evolved trait [23,24]. Recent attempts to extend biological theory and methodology to incorporate cultural change include phylogenetic techniques originally developed by evolutionary biologists , extension of niche construction theory to the cultural domain  and development of selection-based models of cultural evolution and cultural group selection [27–29]. At present, these new perspectives remain poorly integrated into the long-running debate concerning linguistic universals and diversity.
In this review, I begin by defining some terminology, and then concisely review the literature concerning language universals and language diversity. This review clearly indicates that both diversity and universality of various kinds exist, and require biological explanation. I argue that the traditional approach to this problem, which dichotomizes between ‘general purpose’ and ‘specially adapted’ mechanisms, leads down a blind alley, and has been an unproductive focus of debate. I suggest that a focus on specific neural and genetic mechanisms involved in language acquisition is more likely to be illuminating, and that such mechanisms are unlikely to fall into neat categories, whether psychological (e.g. specialized versus general purpose) or linguistic (e.g. phonology, syntax and semantics). A generalized evolutionary theory incorporating both cultural and phylogenetic change must both embrace linguistic diversity and continue searching for language universals and their mechanistic basis. As in biology more generally, a thorough study of diversity is necessary to delineate universal constraints. These are not competing, alternative approaches. Finally, as a first step in this direction, I sketch a conceptual framework, modelled on differential equations, that easily incorporates unity and diversity into a comprehensive, explicit framework.
I use ‘language’ to denote any system that freely allows concepts to be mapped to signals, where the mapping is bi-directional (going from concepts to signals and vice versa) and exhaustive (any concept, even one never before considered, can be so mapped). Although there is nothing restricting language to humans in this definition, by current knowledge only humans possess a communication system with these properties. Although all animals communicate, and all vertebrates (at least) have concepts, most animal communication systems allow only a small subset of an individual's concepts to be expressed as signals (e.g. threats, mating, food or alarm calls, etc.).
I will restrict my use of the term ‘evolution’ to change in gene frequency in populations (its modern Darwinian sense). Considerable misunderstandings have been created by the use of ‘language evolution’ to refer to the purely cultural, historical process whereby a language like Latin morphed over time into French, Spanish or Italian; I adopt the term ‘glossogeny’ to refer to this form of cultural, historical change, following Hurford , and when necessary ‘phylogeny’ to denote biological evolution. Study of the biology of language must include both phylogenetic and glossogenetic components [3,31].
Darwin freely used the words ‘innate’ and ‘instinct’ [1,32,33], but, despite its wide use in psychology and linguistics  and despite some impassioned biological defences , the term ‘innate’ is today seen by some biologists as hopelessly confused and confusing (e.g.). Nonetheless, some genetic basis for language acquisition is implied by the very notion that the ‘instinct to learn language’ evolved. The term ‘innate’ can defensibly be used as a shorthand for ‘reliably developing’ or ‘canalized’ . An ‘instinct’ is any innate cognitive mechanism or behaviour pattern, including those mechanisms underlying learning. Thus, there is no contradiction in postulating an ‘instinct to learn’ language [2,38,39], and seeing its study as a central component of biological linguistics. Only an outmoded and oversimplistic view sees nature and nurture as dichotomous opposing explanations, rather than complementary aspects of epigenetic developmental explanations .
2. unity and diversity of language from the viewpoint of linguistics
(a) Language universals and ‘universal grammar’
Although the modern use of the term ‘universal grammar’ is today mostly connected with the ideas of Noam Chomsky, both the term and concept have a far older history (cf. [41,42]). In its original usage, universal grammar denoted those aspects of a language that are so general and widely shared that they do not need to be mentioned in the particular grammar of any one language. For example, in 1788, James Beattie said of languages that ‘though each has peculiarities, whereby it is distinguished from every other, yet all have certain qualities in common. The peculiarities of individual tongues are explained in their respective grammars and dictionaries. Those things, that all languages have in common, or that are necessary to every language, are treated of in a science, which some have called universal or philosophical grammar’ (quoted in ). Such facts as ‘languages contain meaningful words’ or ‘utterances express meanings’ were seen as too obvious to require mention in a grammar of Latin or French. Of course, such general principles might not be obvious to a Martian or a chimpanzee; ‘obvious’ does not imply ‘logically necessary’. Understanding this broadly shared basis for language, whatever it might be, was seen as central to understanding human nature by many eighteenth-century philosophers.
In this original form, there was a fairly transparent connection between the notion of ‘language universals’ and universal grammar, and one implied the other. However, by the 1960s a far broader understanding of the world's linguistic diversity made it seem unlikely that all languages would share any particular superficial features. In a seminal volume, a team of structuralist linguists led by Joseph Greenberg initiated the modern search for universals with an acknowledgement of this fact . Greenberg and colleagues distinguished between several classes of regularities—‘universals’ in a ‘somewhat extended sense’ [43, p. xviii]. Such regularities go beyond the truly universal regularities expected by Beattie. In particular, this new search for cross-linguistic regularities sought two new categories of ‘universal’. ‘Universal implications’ take the form that ‘if x is present in a language, then y will be as well’. For example, if a language has a dual case, it will have a plural as well. Such implications might be true of all languages, without implying that either x or y is present in all languages. Such implications took a first crucial step towards the kind of abstraction that characterizes modern approaches to language universals [12,44–46].
Greenberg and colleagues also discussed what they called ‘statistical universals’, which are of the form ‘for every language, x is more probable than y’ or ‘if a language has x, then it is more likely to have y than z’. An example of the first type is that suffixing is more common than prefixing which is more common than infixing. The second type is illustrated by the fact that, with only a few exceptions, languages that mark gender in the second person also mark it in the third person. Finally, Greenberg and colleagues highlighted the search for relationships among different universals. For example, the existence of double consonants at the beginning of a syllable implies, for all languages, the existence of single consonants (but not vice versa). Similarly, triple consonant clusters imply double consonant clusters. These two regularities are related by a more abstract rule: ‘(for n > 0), if n consonants can cluster, so can n − 1 consonants’.
A different class of universals were highlighted by the linguist Charles Hockett, who reasoned that a search for universals should start by comparing human language with animal communication systems . Amplifying upon his famous ‘design features’ of human language , he argued that all spoken languages show a wide variety of universal traits (table 1), and that this combination of features is found in no other species. While some of these features would be modified today (e.g. Hockett focused only on spoken language, while today linguists agree that signed languages are full, complete human languages), many have stood the test of time. Increasing knowledge has revealed occasional exceptions to features that Hockett viewed as absolute universals, rendering them (highly probably) statistical generalizations rather than strictly present in every language. A recent example is ‘duality of patterning’. Languages use a limited set of meaningless items (phonemes) to build up a much larger set of meaningful words, and then, at a second level, recombine these words into sentences that also have meaning. Research on a recently developed Bedouin sign language suggests that this language, alone in the world, lacks such duality of patterning . But this single exception does not invalidate the regularity. Instead, it suggests that a new language must exist for more than a few generations before it develops duality during glossogeny. Furthermore, this exception offers the exciting possibility of observing and studying the emergence of a language universal, of catching glossogeny in the act of generating a design principle of language.
In summary, from its beginnings, the modern linguistic quest for language universals has sought probabilistic regularities that are abstract and implicational (rather than universally present). The authors assembled by Greenberg  also saw the statement of universals as a first step in discovering the principles of language acquisition, psycholinguistics or sociology that create such static patterns, and sought to understand both regularities and the processes that generate them. Finally, they recognized that the discovery of language universals, in this extended sense of abstract cross-linguistic generalizations, particularly in comparison with communication systems in other animals, must play an important role in a biological understanding of human language.
(b) Universal grammar and Noam Chomsky
At roughly the same time, a revolution was occurring in linguistics, with the introduction of generative linguistics by Noam Chomsky and his colleagues (cf. ). Chomsky broke with the previous structuralist tradition in several ways, but the most relevant here is that he emphasized the complexity of syntax, and thus the seemingly miraculous fact that every child implicitly does what generations of linguists have so far failed to achieve explicitly: learn the complete grammar of a language. Chomsky argued that the child comes into the world biologically equipped to learn language, and adapted the old term ‘universal grammar’ to denote this innate biological endowment, whatever it might be. Chomsky also highlighted its essential role in the universal ‘creative’ aspects of every language, which ‘provides the means for expressing indefinitely many thoughts and for reacting appropriately in an indefinite range of new situations’ [41, p. 6]—the property that most clearly distinguishes language from other animal communication systems. Chomsky's new interpretation of the term universal grammar (henceforth abbreviated UG) thus placed the creative, productive aspect of language at centre stage.
Chomsky extended the abstraction of the term universal even further than Greenberg and colleagues, recognizing two further categories of abstract universal. ‘Substantive universals’ make claims about the inventory of units from which a language is built. For example, structuralist phonologists argued that all phonemes of all languages are built up of a small set of distinctive features (such as voiced/unvoiced) and the Port Royal Grammarians suggested that all languages must have nouns and verbs. Chomsky further suggested ‘that each language will contain terms that designate persons or lexical items referring to certain specific kinds of objects, feelings, behaviour, and so on’ [41, p. 28]. Substantive universals are regularities at a relatively superficial descriptive level.
Chomsky also highlighted a second more abstract type of universal. ‘Formal universals’ involve the types of rules and regularities that can occur in a language, and the ways in which they can interact. In syntax, for example, a core idea of generative grammar is that phrases and sentences have a tree-like structure: they cannot be fully understood as simple strings of words. An example of a formal universal would be that syntactic rules apply to such trees (rather than, say, serial word order) and thus that syntactic rules need to be stated in structural rather than serial terms. At the semantic level, Chomsky proposed ‘that proper names … must designate objects meeting a condition of spatio-temporal contiguity’ or that ‘colour words of any language must subdivide the colour spectrum into continuous segments’ as examples of plausible formal universals. Note that there is no restriction in these examples to syntax, nor stipulation that such formal universals are somehow encapsulated to language: the colour example clearly involves an interface to the sensory world of vision to even be meaningful. Indeed, Chomsky emphasized that ‘we do not, of course, imply that the functions of language acquisition are carried out by entirely separate components of the abstract mind or the physical brain’ and that ‘it is an important problem for psychology to determine to what extent other aspects of cognition share properties of language acquisition and language use … to develop a richer and more comprehensive theory of mind’ [41, p. 207]. Thus, despite a possible connotation that universal grammar is specific to syntax, or to language more broadly, Chomsky specifically denied any strict separation of language and other aspects of the human mind in his re-introduction of this term. The notion that UG concerns only syntax is probably the most pernicious of a number of common misinterpretations of UG; see ch. 4 of Jackendoff  for a more complete list, and rebuttals.
UG is thus nothing more or less than an abstract characterization of the human language faculty (FLB)—the instinct to learn language—including all of its mechanisms and their interactions. It is unsurprising that the last 40 years have seen considerable debate concerning its nature: we would not expect the formidable task of characterizing this key element of human cognition to yield easily to linguistic research. Thus, many researchers united in their search for the innate basis of the FLB have offered diverse approaches to linguistic theory, representing different theoretical gambits concerning the contents and nature of this faculty. Chomsky's most recent tack is dubbed ‘The Minimalist Programme’  because it seeks to reduce those aspects of the human mind that are specific to language and syntax to a bare minimum, perhaps as little as one powerful operation called ‘Merge’. Most other universal features of language acquisition would then result from other aspects of the human mind (cognitive, perceptual or motor skills), or from the interactions of these cognitive mechanisms with this minimal syntactic core.
In contrast, more elaborate models of UG posit an extensive suite of human- and language-specific mechanisms, running the gamut from speech perceptual and vocal tract adaptations to high-level syntactic structures [14,50,52]. An increasingly popular formalism called ‘optimality theory’ [53,54] posits an innate set of constraints on language and proposes that language acquisition requires the developing child to implicitly rank these constraints. Radical construction grammar proposes that abstract universals will only be found ‘in the patterned variation of constructions and the categories they define’ [55, p. 5]. Numerous theorists have suggested that universals result from processing or other ‘performance’ constraints (cf. [24,45]), while Levinson and colleagues cite conversational constraints upon turn-taking as plausible universals . Finally, some approaches to linguistics suggest that essentially nothing in the FLB is specific to language (see the collection in Tomasello ). Such ‘cognitive’ or ‘functional’ approaches are often favoured by psychologists or anthropologists, who reject the notion that the toolkit of language acquisition and processing includes any ‘tools’ specific to language. Although proponents of such approaches often strongly reject the term universal grammar (e.g. ), cognitive universals spanning beyond language are nonetheless part and parcel of the traditional search for universal aspects of the human language faculty and their biological bases.
As emphasized in the useful overview of Jackendoff , such diversity of opinion is to be expected, and is a healthy sign of science at work. When scientists reach broad agreement about the nature of the FLB, the constraints that our innate endowment places on human languages and the manner in which this endowment aids the child in language acquisition, we will have solved some of the most fundamental problems in human biology. It would be naive to expect such a holy grail to yield quickly or easily to scientific research. To give some sense of the state of play, I have listed a number of proposed features of universal grammar in table 2. These are not intended to be either exhaustive or necessarily self-consistent, but rather to provide a sense of the kinds of features and issues that are currently being debated. Many of these universals have at least one language that appears to be an exception (cf. ), though many exceptions are debated by other experts (cf. the commentaries on that article). It can hardly be doubted that this debate will continue for many more years.
In summary, the search for linguistic universals has proceeded from the eighteenth-century assumption of a rather superficial list of features common to languages (every language has words, every language has nouns and verbs) to a far more abstract set of generalizations and regularities about the human language faculty, and the biological endowment that a human child uses to acquire language [41,42]. These regularities will certainly incorporate more general aspects of cognition, including aspects of perception, motor control or conceptual structure that predated language in human evolutionary history. From this abstract perspective, UG is not reducible to a list of properties universally found in every language, nor does its existence imply such a list. As Jackendoff  puts it, UG is a characterization of the toolkit the child uses in language acquisition, not a list of universal features of adult languages. Jackendoff emphasizes that ‘not every mechanism provided by universal grammar appears in every language’ since ‘when you have a toolkit, you are not obliged to use every tool for every job’. It is quite unfortunate, then, that many critics have conflated UG and surface language universals, and proffered the discovery of exceptions to some broad regularity as a refutation of UG (e.g. [13,59]). As Roman Jakobson, a tireless defender of the search for universals, pointed out, ‘a rule requiring amendment is more useful than the absence of any rule’ [60, p. 147]. The notion of UG is perfectly compatible with a very broad range of linguistic diversity, evolving via cultural processes, and indeed has developed over many decades with precisely this diversity in mind.
(c) The diversity of human languages
Within the broadly defined and still incomplete set of commonalities and regularities discussed above, the diversity of existing human languages is quite astounding (cf. ). The closest non-human analogue to this culturally transmitted diversity comes from the song systems of some songbirds (e.g. mimic thrushes like the brown thrasher [61,62]) or humpback whales [63–65], but I know of no animal communication system that comes close to matching the range of diversity in the more than 6000 existing human languages (ethnologue currently reports 6909: www.ethnologue.com). Diversity itself is an important aspect of the biology of language, clearly tied to the learned, culturally transmitted aspects of human language .
Within these broad constraints, virtually every aspect of human language is variable. A fundamental difference is modality, which varies between spoken languages and over 100 signed languages, expressed via manual and facial movements. Signed and spoken languages are equivalent in their complexity and expressive power, despite using completely different input/output mechanisms [66–68]. Although many animal communication systems contain both visual and auditory components, there is no non-human system in which one modality can be completely replaced by another and yet convey identical messages .
In the domain of sound systems, all spoken languages include consonants and vowels, but there is huge variation in the number of phonemes, from 11 to roughly 150 [13,70]. Among vowels, many of the world's languages have only three vowels, and the mean number is five [71,72], making the English vowel system rather rich with its 15 or so vowels (despite our writing system making do with six). Consonants are even more variable in number and type .
Nonetheless, the diversity of human vowel systems is underlain by well-understood regularities. Vowel systems provide an excellent model system for understanding the interactions between cultural transmission, communicative efficiency and universality. Across many languages, the distribution of vowels in formant space changes systematically as vowel number increases. This pattern can be duplicated by a simple mathematical model of energy-optimized intelligibility . Computer simulations that explicitly model glossogeny converge on a set of vowel patterns quite similar to those observed in real languages [75–77], suggesting that cultural transmission plays a central role, though always within biologically imposed limits. These universal regularities in vowel systems can be understood as resulting from an interaction between biologically given aspects of human audition and vocal production (the ear and vocal tract) with constraints of communication, intelligibility and ease of production, and optimized over many generations. Vowel systems are thus one of several abstract universals that derive from an interaction of biologically given and glossogenetic forces; they illustrate the futility of attempts to assign such aspects of language to one or the other of these categories.
Words and their internal morphological structure are one of the most variable aspects of language. ‘Morphemes’ are meaningful units of language; they include free morphemes (words like ‘dog’ or ‘bark’) and bound morphemes that must be attached to other morphemes, like the English ‘-ed’ marking past tense, or ‘-s’ marking plurals. These morphemes can be combined to form multi-morphemic words like ‘dogs’ or ‘barked’. So-called ‘isolating’ languages (e.g. Chinese) lack such morphological processes almost entirely, while ‘polysynthetic’ languages have vast complex stores of bound morphemes serving functions that, in English, are accomplished by adjectives, adverbs or syntax . Such languages are widespread, including Ainu in Japan, Chukchi in Siberia and Mohawk and many other Native American languages . In most of these languages, a single ‘word’ can express complex meanings that in English or other European languages would require an entire phrase or sentence.
Turning to syntax, while the word classes ‘noun’ and ‘verb’ appear to be universal, some languages appear to lack such familiar classes as adjectives and adverbs. Further, there are important word classes in other languages that seem unfamiliar to Europeans, such as ‘classifiers’ or ‘coverbs’ (cf. [13,55,80]). Other languages take the onomatopoeia expressed in English words like ‘meow’ or ‘moo’, or the sound symbolism in words like ‘glitter’, ‘gleam’, ‘glisten’, ‘glimmer’ (for shimmering light) to a far more complex and productive level. Such syntactically peculiar ‘ideophone’ systems [81,82] can include thousands of items (e.g. Japanese ‘doki doki’ for ‘heart-pounding excitement’).
At the level of semantics, languages obviously vary considerably in words involving technology: such nouns as ‘keyboard’ or ‘laptop’ are recent English acquisitions, while older nouns like ‘calash’ and ‘futchel’ (parts of horse-drawn carriages) have virtually disappeared in 100 years. Beyond such superficial variation in the lexicon, languages vary considerably in their colour system or number system (although virtually all languages distinguish ‘one’, ‘two’ and ‘many’, and colours follow universal patterns [83,84]). For spatial vocabulary, some languages use absolute references rather than locally defined spatial terms to denote location: rather than saying ‘the chair on your right’ they would say ‘the chair to your north’ .
Finally, at a pragmatic level, there can be huge variation within a single language in terms of the words, syntax and even phonetics used by men and women, or language used between social equals versus between dominant and subordinate individuals. The common distinction in European languages between informal and formal ‘you’ (e.g. ‘tu/vous’ in French or ‘du/Sie’ in German) pales in comparison to the extensive differentiation found in Japanese or many other languages.
Although this brief overview gives only a taste of the kind of variation seen among languages, it shows that many ‘universal features’ one might guess at, based on their ubiquity in European languages, are not shared by many other languages in the world. This fact led many of the early American linguists engaged in documenting Native American languages to believe in essentially unconstrained variation. Nonetheless, for all of the examples above, linguists have uncovered regularities revealing constraints on the form of possible human languages. We now turn to the mechanisms underlying these regularities.
3. a biological perspective on language diversity
A tension between diversity and universality is a long-running theme in biology. For example, a distinction is often made in systematics between ‘lumpers’ who, recognizing the fundamental affinities of a clade, combine them in one group, and ‘splitters’ who, emphasizing the differences, split them into multiple groups. A similar distinction can be made among students of language. Nothing of deep significance rests on this distinction, because a fundamental contribution of Darwin's notion of ‘descent with modification’ is that evolution generates groups of organisms related in a tree-like fashion. It is essentially a matter of taste whether one emphasizes the twigs or the main branches; both are important and both need to be recognized and studied. These observations are as true of glossogeny, the cultural evolution process that generates languages, as for biological evolution, and indeed many of the same tools can thus be fruitfully used to analyse them [25,86,87].
An analogy to the diversity and unity of languages is provided by features of our own vast phylum, the vertebrates. Universal vertebrate features are encompassed in the notion of a Bauplan: a ‘body plan’ that includes (or included during development) a notochord running down the spine, and bony vertebrae built around it. To this are attached ribs and generally appendages. A mouth at the front of the animal serves for both food and respiration, and is followed by branchial arches forming jaws, gills or other diverse structures. Many other shared traits also characterize most vertebrates, but these few suffice to make the point: each of these traits is absent or modified in one or a few species, but this does not render the notion of the body plan vacuous. So, for example, snakes have lost their limbs and sharks and rays have lost their bony skeleton . In much the same way, we expect the ‘basic body plan’ of language to have certain characteristics that are common or even ubiquitous, but should not be surprised to find exceptions to some or even all of the ‘standard’ characteristics. Thus, when scholars cite unusual languages as a refutation of the entire concept of UG (e.g. [13,59]), they both overlook the nature of biological systems, which typically allow exceptions, and ignore many explicit hypotheses about UG that have been offered over the years.
(a) ‘General’ versus ‘specialized’ mechanisms as a false dichotomy
Much of the current debate within linguistics concerning universals centres not on whether some regularities, suitably abstract or statistical, exist. All commentators agree the answer is yes, perhaps with occasional exceptions. The arguments concern whether these result from cultural or biological factors, and if biological whether the underlying mechanisms are specific to language or result from some more general cognitive constraints (e.g. the vocal or auditory apparatus, pragmatics, functional constraints on communication, or limitations of short-term memory). Given the fact that human cultural capacities themselves rest upon a unique biological basis, the debate actually hinges on a distinction between ‘general cognitive’ and ‘specifically linguistic’ neural mechanisms in our species.
I suggest that from a biological viewpoint this distinction is unproductive and misleading, and that the debates surrounding it have led cognitive science down a blind alley. Whether we consider neural mechanisms underlying language, the genetic mechanisms that allow them to develop reliably in our species or the evolutionary factors that led to these factors, the ‘language-specific’ versus ‘general cognitive’ distinction becomes vague and unhelpful. This is not, of course, because the study of such neural and genetic mechanisms, or the developmental, cultural and evolutionary processes that generate them, is vague or meaningless—quite the contrary. Rather, it is because the interwoven causal forces that underlie these mechanisms and processes do not admit of simple explanations, where each outcome is associated with a single reified ‘cause’ or ‘function’. Development involves cycles of causation, where variables that are initially effects later act back upon their previous causes. Development involves a cascade of such cyclically causal complexes, allowing initially simple systems to differentiate and increase in complexity. This epigenetic perspective allows resolution of many otherwise paradoxical observations, but demands that we relinquish simple linear notions of causality implicit in traditional preformationist and/or instructivist models . Adult mechanisms will not be explained in terms of simple, singular ‘original causes’, whether functional, developmental or evolutionary.
To illustrate, consider a few well-defined mechanisms involved in spoken language. First, the capacity for vocal imitation, unique to humans among primates, appears to rest on the existence of direct connections between lateral motor cortex and the motor neurons serving the larynx, tongue and respiratory muscles (reviewed in ). Such connections exist in humans and not other primates , but comparable connections also exist in vocally imitating birds [92,93]. The capacity for vocal imitation, and thus this neural mechanism, is a central requirement for culturally shared spoken language. Can we thus say that this mechanism ‘evolved for’ spoken language? Not necessarily—increased vocal control and imitation of vocalization also plays a central and necessary role in human song . While some scholars have argued that song, or music in general, is non-adaptive, unselected by-products of language (e.g. ), others since Darwin have suggested that music evolved before, and paved the way for, spoken language [1,96]. Thus, the question of whether direct vocal-motor connections are specifically ‘for’ language or not hinges on a debate about original function that is very difficult to resolve empirically, rather than any facts about the current function or mechanistic basis of human vocal control. In any case, the mechanism is both shared with song, and with other species, and is squarely part of FLB.
A genetic example is provided by the FOXP2 gene, which plays a key role in the control of complex, sequential oral and facial movements in human speech . The gene itself represents an ancient transcription factor, widely shared among vertebrates, and the human version contains two amino acid differences that are shared by virtually all humans and not present in chimpanzees or other primates . Mutations in the gene in human clinical cases lead to severe vocal motor apraxia and speech deficits . Is the human allele of FOXP2 ‘for’ language? Proponents would cite the specificity of the mutated genes effects in humans: it specifically and severely affects speech, and not singing, or other more general aspects of cognition . Sceptics would point out that FOXP2 is also expressed in the lungs and other tissues, that it also affects non-speech control of the mouth (especially complex sequences of movements) and that speech is not language. While FOXP2 is expressed in traditional cerebral ‘language areas’, it is also expressed in cerebellum and basal ganglia . Finally, FOXP2 plays a role in bird song learning [102,103], again placing it squarely in the FLB. Nonetheless, it seems likely that the selective sweep that drove the new, human allele of FOXP2 to fixation in the hominid population leading to modern humans had something to do with its role in human spoken language (cf. ). But again, this specific genetic mechanism defies simplistic attempts at functional categorization as general versus specialized. A similar point might be made about recent suggestions that intraspecific variation in genes associated with brain development might subtly affect the propensity of a population, over many generations, to adopt a tonal language . If true, this link need not imply that these genes are ‘for’ language in any meaningful sense.
As a final example, consider ‘Broca's area’—a region of dorsolateral prefrontal cortex whose destruction in adult humans typically causes severe aphasia. Although Broca originally considered this brain area to be specific to speech production, research on aphasics in the 1970s suggested that the region also plays a central role in syntax perception (e.g. ), a conclusion that has been verified and extended by modern brain imaging research (e.g. ). Nonetheless, brain imaging work using different protocols has provided ample evidence that parts of this region play a role in non-linguistic cognitive processes, loosely captured by the notion of ‘switching’ and cognitive control , while its right-hemisphere homologue appears to play a role in music perception [109,110]. Furthermore, it is clear that both the cognitive and linguistic functions normally subserved by Broca's area can be accomplished by other brain regions in cases of early brain damage . That Broca's area is involved in general cognition, in addition to its linguistic functions, suggests that its linguistic specializations are a subset of more general, and presumably primitive, cognitive functions. Again, however, it is difficult to determine whether the non-linguistic functions of this region (cognitive switching or music) are non-adaptive by-products of some originally linguistic function, or whether the linguistic functions are specializations of some more general capacity. Furthermore, it is unclear why resolving this point should be a central concern of those interested in understanding the computations performed by this region of cortex, the core concern of neurolinguistics (cf. ).
What all of these examples make clear is that the distinction between general and linguistically specialized mechanisms is hard to draw, even in those cases where the mechanisms themselves seem fairly clearly defined. Most areas of language are not, and will not soon be, so clearly defined, and thus the distinction itself is of little use in furthering our understanding of the mechanisms. The same is true, more so, for debates about the original function of these mechanisms (cf. ). Thus, the long-running arguments surrounding such distinctions seem likely to continue generating much heat and little light, and to obscure the more basic empirical issues of what the basic mechanisms underlying language are, how they function at physiological and computational levels and whether or not they are shared with other species. Neither the original meaning of the term universal grammar, nor Chomsky's later re-deployment of the term in its modern UG guise, depends on the degree of linguistic specialization of the universal constraints that act on the development of human language. Even the question of human specificity is irrelevant to whether a given cognitive mechanism plays a universal role in structuring human language: indeed the more ancient and widely shared constraints (e.g. limited short-term memory) are the most likely to play a central and universal role in structuring languages. Core mechanisms underlying language can be innate and universal among humans without being either unique to language, or our species.
4. Synthesis: a formal perspective on unity and diversity
The preceding review indicates both that abstract regularities concerning every aspect of language exist, and that the diversity of languages within these broad constraints is considerable, dwarfing that found in other animal communication systems. These facts demand a perspective on the biological nature of language that encompasses both unity and diversity. I have already suggested that the notion of a body plan provides one analogy for this kind of ‘diversity within unity’, and recent progress in evolutionary developmental biology offers clear examples where traditional notions of Baupläne can be cashed out in terms of HOX genes specifying axial segmentation and specification [113,114]. Similarly, the diversity and unity of the tetrapod hand  can be understood in terms of the shared transcription factors regulating limb growth [116,117]. Many more examples of this kind are sure to follow, and enlightening genetic and developmental data are accumulating rapidly. Baupläne, and the general constraints they imply, are real, and can be understood mechanistically in terms of developmental processes. The parallel with UG and particular languages seems unmistakable, and has informed linguistics thinking since the birth of generative linguistics [41,42]. Thus, it is perhaps not premature to seek a more general theoretical framework within which diversity and unity, in both biologically and culturally evolving systems, can be fruitfully integrated.
I suggest that the general notion of abstract constraints, operating ubiquitously during the development of a system in time and space, provides one such framework (figure 1). Such systems are familiar: a rich body of mathematics exploring such constraints is the theory of differential equations. A differential equation is simply one that expresses the relationship between a variable and one or more of its derivatives as they change in time, and sometimes space. Indeed, they would be more transparently termed ‘derivative-based equations’ . Differential equations exist in many forms, but in general they are among the fundamental mathematical tools used by physicists: Newton's Laws, Maxwell's Laws, the wave equation and a vast array of other equations central to all branches of physics and biology are expressed as differential equations. A differential equation like x″ = ax expresses a constraint on the movement of an object: its acceleration x″ must be proportional to its location x. In general, there are an infinite number of specific paths that could satisfy this constraint. If we denote a particular path or form of movement as a function f(x), we can ask whether or not this function satisfies the constraint(s) embodied in the original equation. If so, it is termed a ‘particular solution’. Because there are an infinite number of solutions, we can think of this differential equation as defining a vast family of solutions, some of which may be superficially very different, but all of which have in common that they satisfy the constraint defined by the original equation. In some cases, we can discover a broader ‘general solution’ (e.g. periodic oscillation) that encompasses an entire set of specific, particular functions (box 1).
Box 1. General and specific solutions for an ordinary differential equation.
Figure 1a gives the differential equation y′ = ry(M − y), where y is a function of time, and y′ denotes the first derivative with respect to time. This is an example of a ‘logistic’ equation, often used in modelling growth. It is simple, but approximates in a general way many developmental or ecological growth processes. Figure 1b illustrates the constraints on a general solution to this equation by the arrows, which indicate what the slope (y′) of the function must be at each point. Parameters determining a particular solution include initial conditions and boundary conditions. One particular solution is shown as the black S-curve in figure 1b, with the initial condition y = 0.
Figure 1c illustrates a selection of particular solutions, from the infinite set of such solutions, each starting with a different initial y, but fulfiling the same overall constraints. While a splitter might look at figure 1c and see a group of categorically different functions (e.g. descending versus increasing), the lumper would search for commonalities, and in this case, would find them in the general solution to the underlying differential equation (figure 1b).
Although such a first-order model is obviously trivially simple compared with any actual biological system, it provides a well-understood mathematical metaphor for the kind of formal framework required to conceptually integrate a diversity of surface structure with unity of the underlying process.
The parallel with language is clear: particular languages correspond to specific solutions to the constraints imposed by human biology on language acquisition and historical change. Initially, a central task for studies of language diversity will be to find statistical abstractions that encompass the range of linguistic variability (cf. [13,119]). The search for universals is akin to the search for a general solution that encompasses all of these particular solutions, and the goal of biolinguistics is to understand, and make explicit, the specific biological constraints that underlie this general solution. Of course, we expect many such constraints to interact with each other over developmental, historical and evolutionary time . Chomsky has recently suggested that historical factors, like the Norman Conquest for English, probably play a central role in generating such diversity . These interacting systems entail dauntingly complex systems of partial differential equations involving genes and the epigenetic control of their expression, brains and their self-wiring depending on the organism and its environment, and individuals as part of cultural systems.
Although at present I offer this parallel as a metaphor, it will become more than that as these systems become better understood. There can be little doubt that the mathematics of biological and cultural change will rely heavily on differential equations. Unfortunately, when it comes to the systems of nonlinear partial differential equations that typify real biological systems, there is no guaranteed way to find general solutions. In complex, real-world examples, nature provides a few examples of particular solutions, and the hard work is to find the constraints underlying such solutions and, perhaps, to discern general solutions. Systems of interacting nonlinear equations exhibit sensitive dependence on initial conditions, bifurcations and chaos. Understanding the attractors that constitute general solutions in such systems represents a daunting frontier for theoretical biology [121,122]. Both top-down approaches (invoking cultural and historical factors) and bottom-up or ‘reductionist’ approaches (e.g. gene or brain-focused research) will be important for a full characterization of this complex system . No one expects such a task to be easy. Equally, no one can deny the fundamental significance of the search.
To conclude, I have suggested that progress in understanding the biological constraints underlying human language must, of course, attend to the vast diversity of human languages, which provide crucial insights into the range of particular solutions to the problems language poses. But such progress also requires a search for universals, in the abstract sense of cross-linguistic generalizations that has always been understood in modern linguistics [12,41,50,60]. This is equivalent to seeking the general solution encompassing these particular solutions. This search, even when incomplete, will provide essential fodder in the search for the underlying biological constraints. Rejections of the search for universals, based on a few exceptions to some otherwise universal rule, miss the point of this endeavour. Arguments about whether the constraints are general to cognition, or specific to language or to humans, are in my opinion unlikely to help resolve the substantive biological issues involved in understanding the FLB. Nor will an attempt to divorce cultural processes from linguistic or biological processes help: the very capacity for culture has a strong biological basis in our species, and human cultural evolution is intimately bound up with language itself. While drawing distinctions between such categories may prove heuristically useful in some cases, treating them as dichotomies will simply impede progress. Future progress will require integrated discussions of language diversity and the underlying unity of the instinct to learn language. As the neural and genetic data continue to flow in, we will increasingly need conceptual frameworks encompassing both diversity and unity, rather than dichotomies that polarize them.
I thank William D. W. Fitch, Daniel Everett, Stephen Levinson, the editors and three anonymous reviewers for comments on an earlier version. Writing was supported by ERC Advanced Grant SOMACCA to the author.
One contribution of 14 to a Theme Issue ‘Evolution and human behavioural diversity’.
- This journal is © 2011 The Royal Society