Evolution of lactase persistence: an example of human niche construction

Pascale Gerbault, Anke Liebert, Yuval Itan, Adam Powell, Mathias Currat, Joachim Burger, Dallas M. Swallow, Mark G. Thomas


Niche construction is the process by which organisms construct important components of their local environment in ways that introduce novel selection pressures. Lactase persistence is one of the clearest examples of niche construction in humans. Lactase is the enzyme responsible for the digestion of the milk sugar lactose and its production decreases after the weaning phase in most mammals, including most humans. Some humans, however, continue to produce lactase throughout adulthood, a trait known as lactase persistence. In European populations, a single mutation (−13910*T) explains the distribution of the phenotype, whereas several mutations are associated with it in Africa and the Middle East. Current estimates for the age of lactase persistence-associated alleles bracket those for the origins of animal domestication and the culturally transmitted practice of dairying. We report new data on the distribution of −13910*T and summarize genetic studies on the diversity of lactase persistence worldwide. We review relevant archaeological data and describe three simulation studies that have shed light on the evolution of this trait in Europe. These studies illustrate how genetic and archaeological information can be integrated to bring new insights to the origins and spread of lactase persistence. Finally, we discuss possible improvements to these models.

1. The biology of lactase persistence

(a) Niche construction, lactase persistence phenotype and genotypes

In biological evolution, natural selection acts on traits that are heritable. The vast majority of these traits are inherited by the transmission of DNA sequences from parent to offspring, i.e. genetic inheritance. But other aspects of the biology of an organism can be inherited extra-genetically, such as certain culturally transmitted behaviours [1,2] and features of the environment that have been shaped by ancestral populations [3]. Since this kind of extra-genetic inheritance can play a key role in the survival of an organism and the evolutionary trajectory of a species, it has been assigned the term ‘niche construction’ [4,5]. A deeper understanding of the relationship between genetic evolution and niche construction can come from evolutionary theory, notably by recognizing that humans are far from unique in their ability to change their own selective environments [6]. However, because human culture has strongly modified our environments with such remarkable ecological and evolutionary consequences [2], human gene–culture coevolution provides some of the clearest and most spectacular examples of niche construction.

Gene–culture coevolutionary theory integrates cultural variation in the analysis of differential transmission of genes from one generation to the next [7]. This theory can be explicitly used to explore the evolutionary consequences of niche construction when a stable transmission of learned information is conveyed between successive generations [8,9]. Indeed, cultural processes can change the human selective environment and thereby affect which genotypes survive and reproduce [1,2]. If the cultural inheritance of an environment-modifying human activity persists for long enough to generate a stable selection pressure, it will be able to co-direct human evolution. There are many examples of this in human evolution [13,10] but none are so well studied, clear-cut, widespread and well supported as the coevolution of lactase persistence (LP) and dairying [4,1113].

Lactose is the main carbohydrate in milk and is a major energy source for most young mammals. The enzyme responsible for hydrolysis of lactose into glucose and galactose is lactase (or lactase-phlorizin-hydrolase, LPH). Without this enzyme, mammals are unable to break down and thus use lactose, and since milk is the essential component of young mammals' diet, lactase activity is fundamental to the early development of most mammals. After the weaning period is over, lactase production usually declines, although the mechanisms and evolutionary reasons for this downregulation are not fully understood. However, some humans continue to express lactase throughout adult life, and are thus able to digest the lactose found in fresh milk. This trait is called LP.

The LP trait frequency is found in around 35 per cent of adults living in the world today [14,15], but varies widely among human populations, both between and within continents (figure 1a). High frequencies of LP are generally observed in northern European populations. Indeed, LP frequency can vary from 15–54% in eastern and southern Europe to 62–86% in central and western Europe, and to as high as 89–96% in the British Isles and Scandinavia [15,16]. In India, LP frequency is higher in the north (63%) than further south (23%) or east [17]. There are relatively little data on East Asians but it seems that the trait is rare there. In Africa, the distribution of LP is very patchy, with high frequencies being observed mainly in traditionally pastoralist populations [14,1820]. For example, LP reaches 64 per cent in Beni Amir pastoralists (Sudan), whereas in Dounglawi (Sudan), a neighbouring non-pastoralist population, LP frequency is around 20 per cent [11,21,22].

Figure 1.

Interpolated maps of the distribution of LP and the −13910*T allele in the ‘old world’. (a) LP phenotype distribution. Data points (dots) were taken from the literature (see text and [14] for details). (b) Distribution of the allele −13910*T, associated to LP. Dots represent sample data taken from a previous review [14,2630]; crosses represent data for new locations not previously tested and diamonds correspond to locations where additional data have been added. Regularly updated frequency data are available at http://www.ucl.ac.uk//mace-lab/GLAD/ website.

In recent years, a number of single nucleotide polymorphisms (SNPs) have been found in association with the LP trait in different populations. The first to be identified, −13910*T, is found not in the LCT gene (the lactase gene) but within an intron of a neighbouring gene, MCM6 [23]. In vitro studies indicate that this nucleotide change affects lactase promoter activity [24,25] and thus is highly likely to cause LP, although it is currently not possible to exclude tight linkage disequilibrium with another, as yet unobserved, functional variant. An interpolated map showing the global distribution of −13910*T can be seen in figure 1b, using published data [14,15,2630] supplemented with recently collected data. A cursory comparison of figure 1a,b shows that while the −13910*T allele may explain the distribution of LP in Europe, it cannot explain the distribution of LP in Africa or the Middle East. Indeed, Mulcare et al. [31] showed this formally using a robust statistical framework, and other studies have identified additional LP-associated alleles that explain much of the distribution of LP in Africa [16,20,32]. Interestingly, all these variants are located within 100 nucleotides of −13910*T in the same intron of the MCM6 gene, a region that is functionally important for the expression of lactase in vitro [24,33]. Because these various LP-associated alleles are found on several different haplotypic backgrounds [14,20,34], it is now clear that LP has evolved multiple times and is thus an example of convergent evolution [35].

Using genetic variation in regions surrounding LCT, it is possible to obtain estimates of the age of specific LP-associated alleles. Dates of origin for −13910*T ranging between 2188 and 20 650 years ago [36], and between 7450 and 12 300 years ago [37] have been obtained using extended haplotype homozygosity (EHH) and variation at closely linked microsatellites, respectively. Similar dates (1200–23 200 years old) were also obtained for one of the major African variants (−14010*C) using EHH [20]. These date estimates are remarkably recent for alleles that are found at such high frequencies in multiple populations. It is easy to envisage recent alleles being rare since they change in frequency slowly, and in a directionless way, by genetic drift. However, a recent allele that has reached such high population frequencies requires more than genetic drift alone; it requires the extra ‘kick’ of natural selection. Indeed, the estimated selection strengths required to explain the age/frequency distributions of −13910*T [36]—and of −14010*C [20]—are enormous (1.4–19 and 1–15%, respectively), which are among the highest estimated for any human genes in the last approximately 30 000 years [36,38].

(b) Selection hypotheses on lactase persistence

The reasons why LP should provide such a selective advantage are still open to debate (see discussion below). However, since this trait has been mainly identified in dairying-practising or pastoralist populations [11] and since fresh milk and some milk products are the only known naturally occurring sources of lactose, it is unlikely that LP would be selected without a supply of fresh milk. Interestingly, the date estimates for the emergence of −13910*T and −14010*C bracket archaeological dates for the spread of domestic animals and dairying into Europe and the spread of pastoralism in Africa (south of the Sahara, into Kenya and northern Tanzania), respectively [11,20,3942]. This supports the idea that LP coevolved with the cultural adaptation of dairying as a gene–culture coevolution process.

Nonetheless, the correlation between LP and milk consumption is not complete [11,16,31,32]. In lactase non-persistent individuals, the fermentation by colonic bacteria and osmotic effects of undigested lactose often cause symptoms such as abdominal pain, bloating, flatulence and diarrhoea. However, it has been shown that some lactase non-persistent individuals can consume lactose-containing products without any obvious ill effects. For example, the low LP frequency of Somali people living in Ethiopia does not prevent them from drinking more than 500 ml of milk per day without any obvious discomfort [16]. This inter-individual variation of the amount of lactose tolerated by lactase non-persistent people may be a result of variation in the composition of the gut flora (particularly the presence of lactic acid bacteria) [16,43]. Also, fermented dairy products (i.e. yoghurt or cheese) contain less lactose, allowing consumption by non-persistent individuals without any of the expected symptoms [43].

Two contrasting theories have been proposed to explain the co-distribution of LP and dairying practices. The culture-historical hypothesis [11,12,44] argues LP developed, and was consequently selected, after milk production and dairy consumption spread. The opposing hypothesis (mentioned in McCracken [12] as the reverse-cause hypothesis) proposes that only populations whose frequency of LP was high enough adopted dairying. In other words, human groups were differentiated with regards to LP frequency by a process unrelated to milk consumption—through genetic drift [45]—before the invention of dairying. This view specifically assumes that drinking milk would not necessarily have conferred any selective advantage. The culture-historical hypothesis is better supported by recent findings from archaeological research (see §1c for details). In particular, organic residues preserved in archaeological pottery provided evidence for the use of milk after 8500 BP in the western part of Turkey [40], a region where LP frequency is low today (figure 1a). It suggests that domestic animals were milked before LP arose or was present at appreciable frequencies.

(c) The advantage of being lactase persistent

Several explanations have been proposed for how and why LP may have been selected. For example, it may simply be that milk is a good source of calories, or specifically an important source of protein and fat. The milk production of a prehistoric cow has been estimated to range between 400 and 600 kg per weaning period. Even when the milk necessary for the raising of the calves is subtracted, some 150–250 kg remains [46]. This is almost equivalent to the calorie gain from the meat of a whole cow. Hence, over the years, milking may have resulted in a greater energy yield than the use of cattle for meat. But it is likely that the benefits LP gave to early dairying populations extend beyond simply increasing the food supply or making more economic use of livestock. Furthermore, the specific benefits of dairying may have varied in space and time.

Strong selective pressures on LP may have been episodic and occurred only under certain extreme circumstances, such as drought, epidemic or famine. For example, milk would have represented an alternative food resource in between periods of crop cultivation. When no cereal food was available, for example between harvesting seasons or in periods of crop failure, LP individuals would have had an advantage. This is especially true for children after the period when lactase production is normally downregulated, a phase of life that shows an increased mortality according to osteological investigation of prehistoric skeletal collections [47,48]. In addition, Cook & Al-Torki [49] hypothesized that in regions where water was scarce, milk would have been used by pastoralist groups as relatively pathogen-free fluid. If by drinking fresh milk, lactase non-persistent individuals were at risk from the potentially dehydrating effects of diarrhoea under such conditions, selection may have been strong in lactase-persistent individuals.

However, this arid climate hypothesis is less likely to have been relevant in Europe, where LP frequencies are at their highest. The observed correlation between latitude and LP frequency in Europe led Flatz & Rotthauwe [50] to propose the calcium assimilation hypothesis. Calcium is essential for bone health and its absorption in the gut is dependent on the presence of vitamin D. While some food sources such as fish are rich in this vital nutrient, most people in the world produce the majority of their vitamin D photochemically in the skin through the action of UVB on 7-dehydrocholesterol. However, UVB exposure is insufficient to produce the required quantities of vitamin D in people living at high latitudes for much of the year. This is unlikely to have been a problem for pre-Neolithic European hunter–gatherers who would have had a vitamin D-rich diet through their consumption of marine foods [51], but may have been a problem for early agriculturalists. Additionally, it has been proposed that a fibre-rich diet, owing to high consumption of cereal grains, can lead to a reduction in the amount of plasma 25-hydroxyvitamin D3 [52]. Thus, the consumption of milk has been suggested to have conferred an advantage to early LP farmers in regions such as the circum Baltic/North Sea area, where climatic conditions allowed high crop cultivation and consumption. Because milk contains small quantities of vitamin D and plenty of calcium, it can provide a valuable supplement at low-sunlight latitudes.

Bloom & Sherman [53] formulated the ecological dairying barrier hypothesis to complement the culture-historical model. They suggested that dairying, and therefore the evolution of LP, required environments that are favourable for raising milk-producing ungulates. Others proposed that the LP distribution is related to malaria [54]. They implied that LP is the ancestral phenotype and that lactase non-persistence would have been selected in regions where the disease was frequent. However, this hypothesis was not supported by studies of lactase non-persistence prevalence in glucose-6-phosphate dehydrogenase deficient subjects from Sardinia [55,56]. In a contrasting hypothesis, it has been suggested that a milk diet would provide protection against malaria by impairing a part of plasmodia metabolism (folate synthesis) [52] and thus lead to selection for LP. Pointing to the tight relation between dairying and LP, it has also been hypothesized that milk drinking was a privilege restricted to some individuals in highly hierarchical societies and that it spread as prestige class behaviour [13,44,57]. (For further discussion on the spread of a prestige-associated culture, see [58].)

A common feature of most populations with high frequencies of LP is a history of dairying activity [11]. The availability of fresh milk to some human groups has challenged their niche, thereby creating a potential genetic feedback for a need of continuous lactase expression throughout adult life. As they appear to be interdependent feedback processes, analyses of LP genetic and cultural diversity can hardly be conducted separately [13]. Hence, it is likely that the study of the process of when, how and why some populations kept and exploited ungulates will shed new light on our understanding of the distribution of LP. Indeed, archaeological data can be used to provide evidence for the presence of this cultural behaviour in past populations. For example, the analysis of milk residues [59] and the determination of kill-off patterns of animals on archaeological sites [60] constitute two archaeological methods that can inform on the emergence of dairying. Consequently, to fully understand the origins and evolution of LP, it is necessary to consider archaeological and archaeozoological research on the origins of domestication and dairying. In Europe, the spread of domestic animals is tightly associated with the diffusion of the Neolithic from the Near East.

2. Lactase persistence, the neolithic transition and the history of dairying

(a) The spread of the Neolithic

Before 8400 BP, hunter–gathering was the only subsistence strategy in Europe, but by 6000 BP, when farming had spread over most of the continent (figure 2), it had become rare. The spread of farming into Europe was dependent on earlier developments in the Neolithic core zone of the Near East and Anatolia. Archaeologically, this process defined a new period referred to as the ‘Neolithic revolution’. It is characterized by the presence of polished stone tools and pottery, a more sedentary lifestyle and the management and subsequent domestication of certain animal and plant species. These features are often characterized as the ‘Neolithic package’, and this term has been widely used to define Neolithic sites, even though all these features are not always present together at these sites.

Figure 2.

Arrival dates and approximate geographical expansions of defined Early Neolithic cultures from [61].

Chronologically and geographically, the Neolithic culture spread from its original core zone to other parts of western Eurasia and simultaneously to north Africa. There are two opposing models for the spread of farming into Europe [62]. One, the cultural diffusion, or acculturation model, favours the idea that local hunter–gatherers adopted Neolithic practices once they gained access to them from farming neighbours. The alternative demic diffusion model holds that the Neolithic was carried throughout Europe by the movement of farming peoples (and consequently their genes) spreading into the territory of foragers. It is highly unlikely that the Neolithic transition could be wholly explained by either of these models alone. The reality is likely to have been more complex and involved local heterogeneity of farming adoption processes [63].

Analyses of European genetic diversity have attempted to address the relative role of acculturation and migrations on the European gene pool during the spread of farming and yielded contrasting results, depending on the methodology employed, the loci studied and the proxies for ancestral source populations used [6470]. Recently, Bramanti et al. [71] presented ancient DNA evidence for a genetic discontinuity between late hunter–gatherers and early farming populations, at the beginning of the central European Neolithic 7500 BP, thus providing support for a migrating farmers model in this region. A full Neolithic lifestyle was established within a few generations in this area and cultural contact between farmers and local foragers persisted for some time. But the way in which Mesolithic hunter–gatherers and the Early Neolithic farmers interacted and how this led to an environment peopled exclusively by farmers is still debated [10,63,72].

(b) Domestic animals associated with the Neolithic transition

Archaeozoological data on domestic animals have been particularly valuable in establishing a more precise picture of the spread of the Neolithic and the new subsistence strategies associated with it. This involves bone analysis (animal species, sex, age at death, morphology, presence of cutting and cooking signs [14], C direct dating and ancient DNA analyses) and placing this within their archaeological context. Morphological changes [73] and culling strategies [74] can be used to identify where and when wild animals started to become herded livestock. During the pre-pottery Neolithic B phase (PPN B) in the Neolithic core zone, several human groups started to manage wild animals at approximately the same time, in different places. After a period of hundreds or even thousands of years, this process finally resulted in phenotypes characteristic of domesticates. These domestication-associated phenotypes illustrate one of the consequences of human cultural niche construction on the evolution of other species [3,8,10,75].

In the context of the Neolithic, the oldest evidence for domestication is for goat and sheep (11 000 BP) followed by pig and cattle (10 500 BP) [73,74,76,77]. These domesticates soon spread from the Neolithic core region to a large part of the Near East, including Cyprus [77,78] and central Anatolia. However, true animal husbandry as a major economic activity in the Near East only began during the 10th millennium BP. Subsequently, but not earlier than 9000 BP, these domesticates spread into western and southern Anatolia [72,79]. From there, the spread of domesticates together with husbandry techniques followed two main routes (figure 2): (i) the Mediterranean route via the Aegean and Adriatic Seas, then moving further west to southern Italy, the Thyrennic Islands, southern France and the Iberian Peninsula; and (ii) a Danubian route via the Balkan Peninsula, further to the south-western part of central Europe and finally into central and northern Europe. The two routes might have met in Greece, in the Rhine valley and in the northwest of the continent before crossing to the British Isles [80].

Domestic goat (Capra aegagrus) and sheep (Ovis aries) must have been introduced to these areas by humans, as there is no evidence of indigenous wild progenitors in Europe. Ancient DNA studies indicate that cattle (Bos taurus) were also imported from Anatolia and, once in Europe, did not mix substantially with European wild cattle, the aurochs (Bos primigenius) [8183]. This is probably because humans managed to keep these two forms of cattle reproductively separate, although substantial size differences and other consequences of domestication may also have limited interbreeding. In contrast, while domestic pigs seem to have been introduced into Europe from the Neolithic core zone, ancient DNA evidence indicates that they mixed substantially with local wild boar [84].

In terms of dairying, the further development of herd structure after the arrival of early domesticates in Europe is crucial. Indeed, neither the Early Neolithic in the Balkans (8200–7500 BP) nor the earliest Neolithic in central Europe (Linearbandkeramik, 7500–7000 BP) yield clear archaeozoological signs of highly specialized dairying. However, in the following periods, especially in central Europe, there is evidence of a progressive increase in herding associated with dairying, i.e. prevalence of female over male animals. This particularly holds true for cattle and goat [46]. Similarly, in southeast Europe, a transition to a more specialized dairying economy becomes increasingly apparent as early as the 7th millennium BP. It should be noted that it is very likely that dairying was practised long before this period [40,42] and may have represented a means of managing wild animals as early as the beginning of the domestication process. However, it did not become a substantial economic factor before the 7th millennium BP in most parts of Europe.

(c) Evidence of the consumption of dairy products

In addition to faunal remains, the detection of dairy fats associated with archaeological pottery is a powerful line of evidence for dairying activities in prehistory [59,85]. Investigations of organic residues in archaeological pottery have revealed a wide range of compound types, including dairy fats [85]. Using this method, the processing of milk has been identified in the western part of the present day Turkey as early as 8500 BP [40]. Similarly, evidence for the use of dairy products has been shown in Neolithic sites of Romania and Hungary around 7900–7450 BP [39], Britain around 6100 BP [86] and Scotland around 3000 BP [87]. Thus, the archaeozoological and residual lipid data clearly indicate that dairying was practised early in Neolithic Europe. However, ancient DNA data from central and northern Europe suggest that LP frequency was low during this period [88]. Thus, we may hypothesize that Early Neolithic people, among whom LP was rare or absent, initially practised dairying in south-eastern Europe and later migrated towards central and northern Europe, an area inhabited by foragers that occupied a different niche [10].

Indeed, stable isotope studies of bone collagen show that while hunter–gatherers largely relied on marine food, the farmers' diet, even on coastal sites, was mainly terrestrial [51]. However, other authors found no evidence of this switch in diet [8991]. This may be because the localization of hunter–gatherers compared with farmer sites is biased towards coastal regions [90,92], or there may be other methodological issues [91,92]. An additional possibility is that in regions with abundant aquatic resources, the process of subsistence change offered by Neolithic farmers would have been slower. For example, in Scandinavia, where marine resources are abundant, there is no direct evidence of a sudden dietary transition [90]. In fact, instead of a replacement of hunter–gatherers by farmers, hunters from the Pitted Ware Culture (PWC) [89] coexisted for nearly 1000 years with the Neolithic Funnel Beaker Culture (TRB). However, it should be noted that recent ancient DNA evidence indicates little genetic continuity between PWC hunter–gatherers and modern Scandinavians [93].

In summary, LP and the main LP-associated allele in Europe, −13910*T, are found at highest frequencies in northwest Europe where dairying arrived latest (figures 1a,b and 2). A simple—phylogeographic—interpretation of the latter would be that LP first evolved in northwest Europe. But computer simulation studies have shown that when a population expands, the centre of distribution of an allele can be far removed from its location of origin [94,95]. This process is called ‘allele surfing’ and is thought to have occurred with the spread of farmers in Europe [65]. Furthermore, selection has—to an extent—shaped the distribution of LP [14,36,37], although it is unclear whether this selection was continuous or episodic, and whether it varied by latitude [50] or ecological zones [53]. It is therefore clear that to obtain a more complete picture of the coevolution of LP and dairying in Europe it is necessary to integrate cultural, demographic and selective processes. Computer simulation represents one of the most promising approaches to understand these processes because it can combine multiple sources of information. However, as with all modelling approaches, it is necessary to identify the key evolutionary parameters that have shaped the observed data. This is a non-trivial task when one considers the range of proposed hypotheses to explain the current distribution of LP in Europe today. In the following section, we review three of the simulation studies that have deepened our understanding of the coevolution of LP and dairying in Europe.

3. Review of simulation studies

(a) Aoki's model of lactase persistence and niche construction

The coevolution for LP and dairying simulated by Aoki [96] captures many of the features of niche construction, as later proposed by Laland et al. [5]. The evolution of populations, made up of four compound phenotypes (two genetic: LP versus non-LP individuals and two cultural: milk users versus non-users), was simulated. The compound phenotype of being an LP milk user was selectively advantageous (fitness of 1), whereas the other three phenotypes were assigned a selective value of 1−s (where s is the coefficient of selection). The effective population size (Ne) assumed was 100 individuals. At each generation, the model considered: (i) random mating between individuals (the random combinations of alleles and cultural behaviour from a parental generation to the next), (ii) the transmission of the cultural trait occurred with distinct probabilities of becoming a milk user, i.e. f(y) for an LP individual and g(y) for a lactase non-persistent individual, and (iii) random sampling of the individuals after the action of selection. At the end of the process, a correlation between the fixation of both the LP trait and the cultural behaviour was measured, and an average time until fixation of LP was calculated. This study addressed three questions: (i) if the evolution of both LP and milk consumption is mutually dependent, what would the correlation between their frequencies be? (ii) Does the type of selection, either strictly genetic or culturally induced, produce differences in the rate at which an allele reaches fixation? (iii) Do the frequency estimates of European LP (falling in the interval (0.05–0.70)) fit the hypothesis of a gene–culture coevolution process that started 6000 years ago?

Although this model did not take into account cultural diffusion from neighbouring populations, an interesting feature is the use of distinct probabilities for transmission of the culture of milk consumption according to the LP phenotype of an individual. A key result was that an incomplete correlation between LP and milk consumption frequencies is expected. This actually represents inter-individual variability in the non-persistence phenotype, and corresponds either to LP individuals that do not drink milk or to lactase non-persistent individuals that do drink milk. It explicitly shows that the gene–culture coevolution hypothesis is still credible even if the correlation between the genetic and the cultural traits is incomplete. This is in accordance with previous analytical gene–culture coevolution studies [7,97,98] which have notably shown that genotype–phenotype correlation can be reduced by a factor depending on environmental and selection variances [97].

Another observation drawn from this model is that the change in LP frequency is slower when the selection is conditioned by culture. This means that in order to explain such high frequency of LP in north-western Europe, we must consider that either every LP individual actually did drink milk, or that other processes have been involved to generate this distribution. The author consequently suggested that to be able to detect such a change in allele frequency since the start of dairying (then considered to have occurred 6000 years ago), either the effective population (Ne) was relatively small (100 individuals as simulated in the study), or the selection coefficient in favour of the phenotype was very high (more than 5% if Ne = 500). Such a high value of 5 per cent falls well within the confidence intervals ((1.4–19%) and (1–15%)) of recent studies ([20,36], respectively) inferred directly from genetic data. However, lower values of the selection coefficient may be expected to fit the observed LP pattern, as it is now known that the start of dairying was much earlier than 6000 years ago [40], leaving more time for the genotype to be selected. In addition, both genetic and archaeological studies point to the importance of the demographic process when studying the Neolithic diffusion in Europe [99,100], whereas Aoki's simulations were performed with a constant population size of 100 individuals. Moreover, demographic expansion has been shown to have a potentially dramatic effect on the diffusion of new mutations [94,95,101]. Hence, we may expect that the assumption of a constant population size could lead to an overestimation of the rate of frequency change and of the selection coefficients.

(b) Spatial variation in selection intensity

While the selection coefficients explored in Aoki's model were assumed to be constant, in a recent study, Gerbault et al. [18] modelled a geographical structuring of selection pressure by latitude, thereby testing explicitly the calcium assimilation hypothesis [50]. The evolution of a dominant allele associated with LP was simulated in four Near-Eastern and 22 European populations since the Neolithic transition in their respective regions, according to two demographic models: cultural diffusion and demic diffusion [63]. Each of these models was tested using either a constant selection coefficient or a selection coefficient varying with latitude. Thus, a total of four scenarios were tested combining the two selection models and the two demographic models. The program used (called Selector) models four parametrized processes: (i) random genetic drift, (ii) logistically regulated population growth, (iii) positive selection varying between 0 and 3 per cent, and (iv) time elapsed since the onset of the Neolithic. To simulate the cultural diffusion model, the initial LP allele frequency was set at 1 per cent in all 26 populations and evolved according to the above processes. One important assumption is thus that the LP-associated allele was already present in Europe before the Neolithic but at a low frequency (1%). Under the demic diffusion model, three populations were used as sources in the Near East (Lebanon, Syria and Iran). The remaining populations, including the fourth Near Eastern population (Cyprus), were populated by sampling from neighbouring populations along the presumed route of the spread of farming, and at the times indicated from archaeological data [41]. A maximum-likelihood test was performed to evaluate independently in each sample what selection coefficient best fitted the observed data (the LP frequency for each of the 26 populations). Moreover, the four simulated scenarios were formally compared using an approximate Bayesian computation (ABC) approach [102] based on LP frequencies within samples [103].

In this study [18], the scenario that gave the highest relative probability of obtaining the observed data was the demic diffusion model combined with a selection coefficient varying with latitude. Indeed, according to this model, genetic drift alone can explain frequencies as low as those observed in southern Europe, as previously suggested by Nei & Saitou [45]. However, in northern Europe, positive selection coefficients were required to drive LP frequencies to their observed values (figure 3). This result supports the calcium assimilation hypothesis. A complementary explanation for increased selection strength at higher latitudes is that the lower temperatures would allow milk to remain fresh for longer [44,57]. If this applies, LP would illustrate a specific case of niche construction where localized access to a resource is variable [5], i.e. fresh milk. Indeed, niche construction modelling [104] has shown that in such a situation a strong association between a cultural and a genetic trait is expected, as is observed for LP and dairying in northern Europe.

Figure 3.

Scenarios simulated in [18] and selection coefficients required to fit the observed estimates of LP frequencies (taken from [18]). Bars represent the 95% confidence interval of the selection coefficient estimated for the population and the central point is the MLE (maximum-likelihood estimate). Populations are ordered from the highest to the lowest latitude: Danish (Dan), Irish (Iri), German from Bremen (Bre), German from Berlin (Ber), English (Eng), Polish (Pol), Czech (Cze), German from Stuttgart (Stu), German from Munchen (Mun), Austrian (Aus), French from Nantes (Nan), Swiss (Swi), Slovenian (Slo), Italian from Brescia (Bre), French from Nice (Nic), Spanish from Santiago de Compostela (Com), Italian from Roma (Rom), Italian from Napoli (Nap), Sardinian (Sas), Spanish from Valencia (Val), Greek (Gre), Sicilian (Sic), Cypriot (Cyp), Lebanese (Leb) and Iranian (Ira) (doi:10.1371/journal.pone.0006369.g002). (a) Demic diffusion, gene–culture coevolution; (b) demic diffusion, calcium assimilation; (c) cultural diffusion, gene–culture coevolution; and (d) cultural diffusion, calcium assimilation.

Even though the model of Gerbault et al. [18] does not simulate a wave of advance, it illustrates the importance of demography, since the demic diffusion model with variation in the selection coefficient was far more likely (99.1%) than the cultural diffusion model (0.9%). The selection coefficients estimated range between 0.8 and 1.8 per cent, and the authors suggested that if the simulations were closer to a wave of advance model, this range might have been lower. It should be noted that these estimates are not constant over all the populations considered, as under the best-fitted models the selection coefficient varied latitudinally and approached zero in southern Europe.

(c) Demic and cultural diffusion of farming

In a more complex spatially explicit model, Itan et al. [105] simulate the spread of farmers from the Near East to Europe over the last 9000 years, taking into account potential interactions with hunter–gatherers. The geographical unit of this simulation was a deme, and in total the simulated world was made of 2375 land demes, each containing three interacting populations: hunter–gatherers, dairying farmers and non-dairying farmers. At each generation, each population underwent a succession of seven processes: (i) logistically regulated population growth, in which each deme had a fixed carrying capacity determined by climatic and elevation factors. The carrying capacities of dairying and non-dairying farmers were equal, whereas those of hunter–gatherers were 50 times smaller [106,107]. (ii) Unidirectional migration process modelled as a stochastic Gaussian random walk from one deme to another (a process equivalent to demic diffusion). (iii) Cultural diffusion, where a proportion of individuals from one culture ‘converted’ to one of the two other cultural groups, based on the relative ‘dominance’ of the other groups in the focal and eight surrounding demes. (iv) Intra-demic gene flow between different cultural groups within a deme. (v) Inter-demic gene flow between neighbouring demes belonging to the same cultural group. (vi) Selection acting on an LP allele only in the dairying farmers cultural group. (vii) Drift of the LP allele in all cultural groups in all demes, modelled as a binomial sampling process. The LP-associated allele was set to appear in a randomly chosen location when the population size of dairying farmers in this deme reached a critical value (20 individuals). This ensured that the LP-associated allele appeared on, or near, the wavefront of dairying farmer diffusion. Its frequency was updated according to the evolution of a dominant allele with parametrized selective advantage [108], which remained constant in each simulation. This selection in turn drove additional increases (over and above the logistically regulated population growth) in the population sizes of dairying farmers [108].

After the simulations, the authors estimated parameters of the model using ABC [102] with a particular focus on two related questions: (i) when and where in Europe did LP-dairying coevolution begin, and (ii) what factors drove the spread of LP to get the observed European pattern of LP frequencies? Simulations were fitted to both observed −13910*T frequencies at 12 European locations and the corresponding dates of arrival of farmers [41] at 11 of these locations. Both the dates estimated (between 6256 and 8683 years BP) and the geographical area identified (the region between central Europe and northern Balkans—figure 4) for the origin of LP-dairying coevolution correlate well with the time and location of the emergence of the Linearbandkeramik (LBK) culture (figure 2). The LBK is recognized to have been the direct forerunner of a Neolithic cattle-based economy [109], whereas on the Mediterranean Neolithic sites, faunal assemblages are more variable in composition and in some places dominated by sheep and goat [42,46,109]. Indeed, cows provide quantitatively more milk than sheep or goats, allowing cattle-based farmers to have a better supply, thereby supporting a stronger selection for LP. Even though these origin time and location estimates were not independently derived, as simulations were conditioned on known farming arrival dates [41], recent results from spatially explicit niche-construction modelling [9] corroborate this hypothesis. In this latter model [9], local mating of niche constructors (i.e. LBK-economy distribution and diffusion) has the effect of increasing the local concentration of the resource (i.e. fresh milk), thereby generating stronger selection in favour of the resource-dependent trait (i.e. LP).

Figure 4.

Approximate posterior density of region of origin for LP—dairying coevolution (taken from [105]). Points represent regression-adjusted latitude and longitude coordinates from simulations accepted at the 0.5% tolerance level. Shading was added using two-dimensional kernel density estimation (doi:10.1371/journal.pcbi.1000491.g003).

Indeed, from the study of Itan et al. [105], the selection coefficients necessary to explain the −13910*T allele frequency distribution in Europe were estimated to range from 5.2 to 15.9 per cent. Although very high, this range falls within the range estimated (1.4–19%) from molecular data [36], but shows less overlap with the range estimated by Gerbault et al. [18]. However, as the selection strength inferred by Itan et al. [105] applies to less than half of the overall population (dairying farmers), the population-wide selection coefficient is likely to be around half of the value reported. A key difference between the two studies is that Gerbault et al. [18] explicitly modelled, and found support for, a latitudinal effect on LP selection coefficients [50] while Itan et al. [105] did not. Even though it is difficult to separate the effects of selection and demography in their model (as demographic parameters and the selection coefficient were estimated simultaneously), Itan et al. [105] argued that such a latitudinal effect is not necessary to explain the higher observed frequencies of LP in northern Europe. Indeed, they found that without any assumption of a latitudinal effect on selection, their model under the estimated parameters predicted higher LP frequencies in northern Europe than observed, and typically lower frequencies in southern Europe than observed. Thus, the addition of latitudinal effect on selection should drive an even poorer fit to the LP data and so could be thought of as having ‘negative explanatory power’. However, it should be noted that Itan et al. [105] did not explicitly model a latitudinal effect on selection, and so have not formally rejected it.

Another key difference between both studies is the values estimated for the selection coefficient, which are larger in Itan et al. [105] than in Gerbault et al. [18]. In the study of Itan et al. [105], the best fit was obtained when the allele was first selected in dairy farmers, in central Europe/the northern Balkans, and subsequently spread into neighbouring regions. High selection coefficients were thus required to obtain the high frequencies currently observed in north and central Europe in a relatively short period of time. In the model of Gerbault et al. [18], the allele was assumed to have reached non-negligible frequencies before it had spread into central Europe as it was already present in the Near East at the onset of the Neolithic transition. Hence, while selection was required, this was to a lesser extent than if it had started in central Europe. There is a potential ‘trade-off’ between selection intensity and the timing and location of origin of the gene–culture coevolutionary process, and in future work, combinations of values for these parameters will bring insights into the means of diffusion not only of LP but of other genetic traits as well.

4. Discussion and perspectives

(a) Simulation overview

Although simulation studies are unlikely to recover the precise evolutionary history of LP and dairying, they have two main advantages over purely genetic or purely archaeological studies. First, they allow the integration of information from multiple sources (genetics, archaeology, ecology). Second, they provide a formal comparison of alternative scenarios, according to the parameters taken into account. Many combinations of parameters can be tested, and extreme scenarios (such as cultural or demic diffusion) can be evaluated statistically, and potentially excluded. Furthermore, ABC methods [102,103] have proved to be very useful when evaluating the fit of observed data on complex scenarios tested, adding power to simulation studies [18,105]. Clearly, simulations performed at the continental scale help to understand general features and processes of the European Neolithic transition, and to explore alternative possibilities that may have led to observed patterns of genetic diversity.

(b) Demography and niche construction

The above simulation studies of LP/dairying coevolution, as well as ecological and archaeological information, suggest that selection on LP is unlikely to have been constant over time and space during its spread throughout Europe, and show that the role of demography in its diffusion cannot be ignored. The extent to which combinations of these two phenomena have shaped the LP distribution as it is observed in Europe remains unanswered. While Itan and colleagues [105] used a complex demographic model, the population growth was logistically regulated for each group, with no direct consequence of the growth of one group on the growth of the others (i.e. competition between cultural groups for land resources). But as almost no hunter–gatherers survive in Europe today, it is logical to assume that niche modification by farmers may have had an effect on the distribution of hunter–gathers that led ultimately to their disappearance (because of the distinct abilities of both populations to niche construct [10]). As an adaptation of the model of Itan et al. [105], an alternative dynamic could be proposed to investigate how demographic constraints may have affected the diffusion of the trait. Inclusion of density-dependent competition, as implemented by Currat & Excoffier [110], should bring new insights into the effect of these demographic processes. This improvement would allow regional variation in the simulation of the Neolithic spread, with a higher carrying capacity of hunter–gatherer populations in areas where marine resources are abundant, thereby potentially changing the competition dynamics between hunter–gatherers and farmers. This would also enable account to be taken both of localized resource availability and spatially structured populations for investigating the evolution of the association of LP (i.e. the resource-dependent trait) and dairying (i.e. the niche-constructing practice) [9,104]. This may be relevant for the co-habitation of hunters from the PWC [89] with the TRB farmers, who coexisted for nearly 1000 years in Scandinavia. Further analyses assessing this co-habitation would be appropriate to explicitly evaluate how the co-influences of genes, environment and pre-existing culture affect differential transmission of Neolithic cultural forms [111].

The spread of LP is a good example of niche construction for two reasons. First, because human groups who started to drink milk modified their pre-existing selection pressure, thereby generating an evolutionary feedback that may have advantaged some individuals against others in certain environmental conditions. Second, by perpetuating this behaviour from the Neolithic to modern days, the coevolution of LP and dairying illustrates how culture can affect the genetic diversity of human populations. That is why niche construction factors should be considered when studying human adaptation. Gene–culture coevolution studies have shown that both genes and culture exert distinct, but interacting, influences in the evolution of human phenotypes [98,111]. More precisely, cultural processes have been shown to alter the outcome expected under purely genetic transmission [7,97,98]. Even though dairying culture has been taken into account, it should be noted that the effect of differential cultural transmission has not been explicitly modelled in the three simulation studies described above. It is thus expected that the combination of demographic information and differential cultural and genetic transmission influenced by selection would enlarge the possible evolutionary conditions that may have shaped the worldwide LP distribution observed today.

(c) Migration and environment

Other environmental parameters are likely to have affected the distribution of the culturally distinct populations, and the speed of the Neolithic transition. Indeed, niche-construction studies have shown that environmental variables should not be seen as static but as dynamic features that change according to climate and human activities [5], thereby modifying the selection pressure to which humans are subjected [35]. For example, Itan et al. [105] took into account environmental factors to determine different carrying capacities according to the economy type of the populations. In addition to this, other environmental pressures, such as vegetation cover and natural water access, may have conditioned the movement of farming populations with their domesticated animals [57]. These environmental variables can allow the modelling of potential regional variation in the spread of the Neolithic, as suggested by archaeological studies [112]. This will have an initial advantage of bringing a better understanding of the effect of the environment on the spread of agriculture and farming. Then, the relative importance of environmental variables that influence migrations can be evaluated. However, it should be noted that increasing the number of parameters in a model is generally to be avoided unless necessary. In particular, because the more complex a model is (the closer to the reality it is), the more difficult it becomes to interpret, as isolating the effects of different processes, and parameters, can be extremely complicated. Furthermore, the parameter space of a complex model can be very difficult to explore by simulation.

(d) The importance of an interdisciplinary approach

While selection on LP has been inferred, it has not been shown directly [113]. Other pleiotropic factors may be involved that affect the ability to digest milk on the one hand or that affect selection on LCT on the other. What is clear from the study of LP-associated genetic variants is that it has emerged independently in different regions of the world [14,20,23,31,32,114]. While some alleles show evidence of strong directional selection, soft selective sweeps [115]—whereby ancestral genetic variation is associated with different adaptive substitutions [116]—have been invoked [16] to explain how several alleles can be maintained in a population through mutation and migration processes. It would be of interest to assess whether such a process applies more widely than in Africa and the Middle East. Ongoing research on the LCT gene and the regulation of its expression, and further sampling, with associated sequencing and phenotyping will be necessary to give a more complete picture of the structure of LP-associated genetic diversity [15]. Additionally, archaeological assemblage studies will provide useful information on the time and location of the development of milk-drinking behaviour. Finally, simulation studies can assess different possible scenarios by taking into account information obtained from different fields of study, particularly by employing ABC methods [102,103]. LP evolution will be better understood by combining data from these different fields, and by generating expectations of different data types in simulations. More broadly, the study of LP evolution is likely to shed light on the interactions and relative contributions of distinct processes involved in shaping human variation.

5. Conclusion

The coevolution of LP and dairying appears to be not as straightforward a process as it may have first seemed. While several alleles seem to be associated with LP, the pattern of LP in Europe appears peculiar, as a single variant (−13910*T) has been identified as explaining most of the modern distribution of LP [15]. In this continent, the spread of dairying-associated practices goes back to Early Neolithic periods, and it is likely that archaeological studies will soon give a more precise picture of the early development of dairying in Europe. Archaeology notably underlines how complex the migration processes may have been. It may not be surprising then, even though the models were different, that simulation studies on the spread of LP in Europe highlight the importance of demography on the distribution of the −13910*T allele. Some of these simulation studies also hint that selection may not have been constant over time and space. It thus appears that, in order to disentangle the effects of selection from those of demography on the distribution of both LP phenotype and genotypes, there is a need to better understand spatial patterns of genetic variation under neutrality [66,101,117,118]. In turn, the extent to which LP-associated genetic variation can be explained by these patterns will determine what can be said about the adaptation process [119]. Inference on the coevolution of LP (and its associated alleles) and dairying is likely to provide a useful framework for future studies on the evolution of other adaptive traits in which niche construction is invoked.


We thank two anonymous referees for their comments. P.G. and A.L. are funded by an EU Marie Curie FP7 Framework Programme grant (LeCHE, grant ref: 215362-2). Y.I. was funded by the B'nai B'rith/Leo Baeck London Lodge and Annals of Human Genetics scholarships. We also thank the AHRC Center for the Evolution of Cultural Diversity (CECD) and the Center for Mathematics and Physics in the Life Sciences and Experimental Biology (CoMPLEX), UCL, for supporting this research.



View Abstract