Human leucocyte antigen (HLA) loci have a complex evolution where both stochastic (e.g. genetic drift) and deterministic (natural selection) forces are involved. Owing to their extraordinary level of polymorphism, HLA genes are useful markers for reconstructing human settlement history. However, HLA variation often deviates significantly from neutral expectations towards an excess of genetic diversity. Because HLA molecules play a crucial role in immunity, this observation is generally explained by pathogen-driven-balancing selection (PDBS). In this study, we investigate the PDBS model by analysing HLA allelic diversity on a large database of 535 populations in relation to pathogen richness. Our results confirm that geographical distances are excellent predictors of HLA genetic differentiation worldwide. We also find a significant positive correlation between genetic diversity and pathogen richness at two HLA class I loci (HLA-A and -B), as predicted by PDBS, and a significant negative correlation at one HLA class II locus (HLA-DQB1). Although these effects are weak, as shown by a loss of significance when populations submitted to rapid genetic drift are removed from the analysis, the inverse relationship between genetic diversity and pathogen richness at different loci indicates that HLA genes have adopted distinct evolutionary strategies to provide immune protection in pathogen-rich environments.
The molecules of the major histocompatibility complex (MHC) in humans, or human leucocyte antigen (HLA) system, play a central role in immunity by presenting virus- or pathogen-derived antigenic peptides to T cells and triggering an immune response (see recent reviews by [1,2]). The HLA molecules are encoded by a total of six class I and 18 class II genes located among many other genes on the short arm of chromosome 6 (6p21.3) , six of them (HLA-A, -B, -C for class I, and HLA-DPB1, -DRB1, -DQB1 for class II) being particularly polymorphic, with up to 2125 currently known alleles for HLA-B (http://hla.alleles.org/nomenclature/stats). Class I and class II molecules differ both in their structure and function: class I molecules, expressed in almost all nucleated cells, consist of a single α chain (non-covalently bound to a small β2-microglobulin polypeptide) and present virus-derived intracellular peptides to CD8+ cytotoxic T cells, leading to the lysis of the infected cells. Class II molecules, expressed by specialized cells of the immune system such as dendritic cells or macrophages, consist of two chains, α and β, and present peptides derived from endocytosed extracellular antigens (e.g. from parasites) to CD4+ helper T cells, leading to a humoral (antibody-mediated) immune response destroying the foreign antigens. In both cases, peptides are presented within a pocket-like ‘peptide-binding site’ (PBS) which is a portion of the α chain for HLA class I, and combined portions of the α and β chains for HLA class II molecules (therefore called αβ heterodimers) . This groove is highly polymorphic; actually, almost all the observed HLA DNA variation is located in exons 2 and 3 for class I, and in exon 2 for class II, of their corresponding HLA genes and results in amino acid substitutions in the PBS. Considering the total HLA variation existing in a given human population, a very large set of antigenic peptides may thus be recognized. A common theory, initially suggested by studies of mice MHC , proposes that, within populations, individuals who are heterozygotes at HLA loci would have a higher fitness in pathogen-rich environments. At the population level, this would explain the maintenance of a very high HLA diversity, which is generally confirmed by an excess of heterozygotes compared with Hardy–Weinberg proportions [6,7]. At the molecular level, it justifies the higher rate of non-synonymous than synonymous substitutions observed in the PBS [8,9].
This theory, currently known as the pathogen-driven-balancing selection (PDBS) model, is however not well understood from a functional point of view . Moreover, other mechanisms such as negative frequency-dependent selection and fluctuating selection in space and time may represent alternative models explaining the above-mentioned observations [10,11]. In the first case, rare alleles would be advantageously selected against pathogens evading common HLA alleles, resulting in a dynamic process of allelic frequency fluctuation through host–pathogen coevolution [12–14]. In the second case, different sets of HLA alleles would be selected positively at different periods according to geographical and/or temporal changes in the type and prevalence of pathogens, also leading to allelic frequency variation through time [11,15]. The importance of those three different mechanisms in shaping the diversity of HLA loci is very difficult to determine as they are not mutually exclusive and their consequences are very similar. Moreover, the specific evolution of either class I or class II HLA molecules, which play different roles in the immune defence, may not necessarily follow the same mechanisms.
Our laboratory has been analysing HLA data in human populations during many years with the aim to reconstruct human peopling history [16–25]. As a matter of fact, our group and others confirmed that the HLA polymorphism evolves under different kinds of mechanisms, i.e. stochastic factors related to the geographical and demographic expansion of modern humans throughout the world, and natural selection [6,7,20,23,26,27]. To disentangle these different effects is a real challenge , but some results have recently been conclusive in finding a significant, although low coefficient of selection for at least one HLA locus, explaining a lack of differentiation of human populations across a geographical barrier . Whether the excess of genetic diversity found for HLA in comparison with neutral loci is directly related to pathogens' history has yet to be determined.
A valuable approach is to compare the genetic diversity between populations living in different parasitic contexts representing different risk factors. Existing databases of worldwide distributions of pathogens may be used to achieve this aim. On the basis of HLA allele frequency data of 61 human populations worldwide and values of pathogen richness provided by the Global Infectious Diseases and Epidemiology Online Network database (GIDEON; http://www.gideononline.com) for different countries of the world, Prugnolle et al.  reported that HLA genetic variation was correlated with pathogen richness, although human colonization history explained a much higher proportion of HLA genetic diversity worldwide. As the authors used a dataset made up of only HLA class I frequencies to investigate the correlations between genetic diversity and virus richness, we decided to perform a similar analysis by extending it to a larger population database including both HLA class I and class II frequencies that we would compare to both viral and all kinds of pathogens' distributions. Our aim was to check whether the PDBS hypothesis could be validated by this approach for the seven highly polymorphic HLA-A, -B, -C, -DPB1, -DQA1, -DQB1 and -DRB1 loci.
2. Material and methods
(a) Human leucocyte antigen polymorphism in human populations
We used a very large database including information on genetic diversity on HLA class I (A, B, C) and class II (DPB1, DQA1, DQB1, DRB1) genes for a large number of populations. This database was compiled by Buhler & Sanchez-Mazas , mainly from the 12 and 13th International Histocompatibility Workshops [29,30], but also from various published reports, and completed with data from our own laboratory. These data consist of allele frequencies defined at a four-digits level and preliminary submitted to a quality control procedure (see Buhler & Sanchez-Mazas  for details). We removed from the original dataset populations that recently migrated from one country to another (e.g. Egyptian Copts living in the United States of America) or which have been categorized as admixed populations (OTH) by Buhler & Sanchez-Mazas . The rationale for this was to avoid putative mismatching between the pathogen environment where the population was sampled and the HLA polymorphism of this population, as its HLA profile could have evolved in response to the pathogen environment of the region where the population initially lived. The remaining 535 population samples were classified into 10 geographical groups, following the recommendation from the International Histocompatibility Working Group—Anthropology/Human Genetic Diversity component of the 13th Histocompatibility Workshop . The number of populations studied for each HLA locus was 88 for HLA-A, 80 for HLA-B, 62 for HLA-C, 56 for HLA-DPB1, 57 for HLA-DQA1, 88 for HLA-DQB1 and 104 for HLA-DRB1.
To describe genetic diversity at the HLA complex within each population, we used two different statistics, the allelic richness and the expected heterozygosity . Allelic richness represents the number of alleles expected in a population sample of size equal to the rarefaction size 2n (i.e. size of the smallest sample of n individuals at this locus). Rarefaction sizes are 50 for HLA-A, 58 for -B, 56 for -C, 60 for -DPB1, 66 for -DQA1 and -DQB1 and 52 for -DRB1. Allelic richness was measured using the rarefaction method  as in El Mousadik & Petit , according to:where k is the number of alleles in the sample and Ni the number of occurrences of the ith allele among the 2N sampled genes.
The expected heterozygosity within a sampled population at Hardy–Weinberg equilibrium was computed according to the following formula:where n is the sample size, k the number of alleles and pi the frequency of the ith allele in the sample.
(b) Pathogen richness
Information on pathogen richness was extracted from the GIDEON database (http://www.gideononline.com/). This database provides information on the presence and the prevalence of infectious diseases in every country in the world. It is updated weekly through peer-reviewed publications in medical journals, abstracts of major meetings and national health ministry reports. The GIDEON database was recently used in a broad range of evolutionary ecology studies [28,34–36]. Information on pathogen richness used in the present study was extracted from the GIDEON database between July and October 2010. In order to relate the level of HLA polymorphism within a population and the pathogen environment of this population, we compiled the number (pr) of infectious diseases present in all countries for which we had information on HLA genetic diversity (n = 73 countries). The mean number (±s.d.) of pathogens per country was 214.15 ± 15.79, with a minimum number of pathogens found in the Azores (n = 183 pathogens) and a maximum number of pathogens found in Brazil (n = 250 pathogens). Then, because it is generally assumed that the level of polymorphism of HLA class I genes (A, B and C) will be better explained by the virus richness rather than by the whole set of pathogens including bacteria and parasites [28,37], we also compiled the number of distinct virus agents (both DNA and RNA viruses) for each country for which we had information on HLA. The mean number (±s.d.) of viruses per country was 41.98 ± 4.52, with a minimum number of pathogens found in New Zealand (n = 34 viruses) and a maximum number of virus found in India (n = 51 viruses). Electronic supplementary material, table S1 lists all pathogens used in this study.
(c) Effect of past colonization history
Since past colonization history has a strong impact in shaping genetic diversity in human populations, including diversity at the HLA complex , we included in the statistical analyses described below the geographical distance between the location of each population sample considered and East Africa. For each population, we computed the distance (in kilometres) from East Africa (taking Addis-Abeda, 9.03 N, 38.74 E as the reference) across landmass, assuming that human populations did not cross large bodies of water during their migration history. Following Ramachandran et al. , we used five obligatory waypoints to obtain estimates of the migration distances. Those waypoints were Anadyr, Russia (64 N, 177 E), Cairo, Egypt (30 N, 31 E), Istanbul, Turkey (41 N, 28 E), Phnom Penh, Cambodia (11 N, 104 E) and Prince Rupert, Canada (54 N, 130 W). To illustrate this for a population located in southeast Asia such as Malays, we computed the distance from East Africa taking the distance between Addis-Abeda and Cairo, plus the distance between Cairo and Phnom Penh, plus the distance between Phnom Penh and the Malay population. Geographical distances (in kilometres) between all points of interest were computed using coordinates with the computer program Geodist .
(d) Effect of genetic drift
Because genetic drift reduces the level of polymorphism in a small-sized isolated population, this process may confound the examination of potential relationships between pathogen and/or virus richness and HLA polymorphism across all populations studied. To take such an effect into account, we also ran the statistical analyses after excluding the populations which were likely to be submitted to a rapid genetic drift. In our dataset, this led us to exclude Amerindian (Region = America in electronic supplementary material, tables S2–S8) and Taiwanese (Country = Taiwan in electronic supplementary material, tables S2–S8) populations , decreasing the number of populations down to 65 for HLA-A, 57 for HLA-B, 42 for HLA-C, 44 for HLA-DPB1, 40 for HLA-DQA1, 69 for HLA-DQB1 and 70 for HLA-DRB1 (see electronic supplementary material, tables S2–S8 for the complete list of populations used).
(e) Statistical analysis
We used multiple linear regressions to measure how the richness in pathogens or, more specifically, viruses, explained the level of genetic diversity (both the allelic richness ar and the expected heterozygosity H) at each locus. In a first step, we measured how the geographical distance from East Africa explained the level of HLA genetic diversity. To that aim, we included in a linear regression model the statistic of genetic diversity (either ar or H) as the dependent variable and the geographical distance from East Africa as the independent variable. In a second step, since we found a significant role of migration history in shaping populations' HLA diversity (see §3), we included the statistic of pathogen abundance (either pathogen or virus richness) as a second independent variable in a multiple linear regression model. This allowed us to take into account the effect of migration history in our search for putative relationships between HLA genetic diversity and pathogen richness. We thus estimated the coefficient of determination r2 as a measure of the proportion of variability of genetic diversity (allelic richness or expected heterozygosity) explained by the geographical distance from East Africa alone or by both the geographical distance from East Africa and the pathogen (or virus) richness. Normality of the data was improved using log-transformations . All tests were two-tailed and conducted using SPSS v. 18.0. Data are presented as means ± s.d and differences were regarded as statistically significant at p < 0.05.
(a) Human leucocyte antigen diversity and geographical distance
We computed the allelic richness ar and the expected heterozygosity H for every population sample and all seven HLA class I and class II loci. Average values of both statistics on all populations may be found in table 1. HLA-B is the locus with the highest allelic richness ar (18.52) and heterozygosity H (0.90), while HLA-DQA1 has the lowest ar (6.59) and HLA-DPB1 the lowest H (0.72).
We then tested the correlation between each of the two statistics measuring genetic diversity (allelic richness ar and expected heterozygosity H) and the geographical distance from East Africa through landmass (table 2). Strong and significant negative correlations between geographical distance from East Africa and both statistics are found for each locus tested (class I and class II). The global correlation between increasing geographical distance and decreasing genetic diversity remains significant after Bonferroni's correction for multiple tests (threshold α′ = 0.007). Allelic richness decreases generally faster than heterozygosity.
(b) Human leucocyte antigen diversity and pathogens
We also computed the correlation between genetic diversity (both ar and H) and pathogen richness, taking into account the geographical distance from East Africa. We considered both all pathogens together and viruses only, which we compared with both class I and class II diversity. Indeed, although the main function of HLA class I molecules is to display virus-derived intracellular peptides to CD8+ cytotoxic T cells and that of HLA class II molecules to present peptides derived from extracellular parasites to CD4+ helper T cells, the mechanisms of the immune response appear to be much more complex; for example, the action of HLA class II molecules triggering an antibody-mediated response may also be necessary in viral infection (W. Reith 2010 and D. Pinschewer 2010, personal communications).
With all pathogens, a significant positive correlation between genetic diversity (only H) and pathogen richness is found for HLA-B, while a significant negative correlation is found for HLA-DQB1. When considering only viruses, the results do not change much. HLA-B shows a significant positive correlation for the two indices of genetic diversity (both ar and H) and the correlation becomes significant for HLA-A (only H). HLA-DQB1 is still the only class II locus with a significant negative correlation. There is no significant correlation between pathogen richness and genetic diversity for HLA-C, -DPB1, -DQA1 and -DRB1.
(c) Genetic drift effect
When small-sized populations which were likely to have passed through episodes of rapid genetic drift are removed (Amerindian and Taiwanese populations), the correlation between genetic diversity (both ar and H) and the geographical distance from East Africa through landmass remains significant except for HLA-DQA1 and -DQB1 (electronic supplementary material, table S9). The significance of the correlation with H, but not with ar, also disappears for HLA-DPB1. On the other hand, when Amerindians and Taiwanese are removed, no correlation between genetic diversity and pathogen richness remains significant (electronic supplementary material, table S10), i.e. for HLA-B between allelic richness (ar) and pathogen or virus richness (r2 = 0.19, p = 0.679 and r2 = 0.19, p = 0. 994, respectively) and between heterozygosity (H) and pathogen or virus richness (r2 = 0.22, p = 0.216 and r2 = 0.21, p = 0.434, respectively). The positive correlation found for HLA-B when all populations are considered is thus low as its significance depends on the presence of some peculiar Amerindian and/or Taiwanese populations, which exhibit a high genetic diversity in pathogen-rich environments (figure 1).
The significant and negative correlation between genetic diversity and pathogen richness at HLA-DQB1 also vanishes out when Amerindian populations (there are no Taiwanese samples here for HLA-DQB1) are not considered, i.e. between allelic richness (ar) and pathogen or virus richness (r2 = 0.01 and p = 0.414 and r2 = 0.0, p = 0. 997, respectively) and between heterozygosity (H) and pathogen or virus richness (r2 = 0.04 and p = 0.740 and r2 = 0.04, p = 0.670, respectively). It thus seems that the negative correlation found at HLA-DQB1 is low as its significance depends on the presence of some peculiar Amerindian populations, which exhibit a very low diversity despite living in pathogen-rich environments (figure 2).
The crucial role of humans' migration history in shaping HLA genetic patterns worldwide is revealed in this study for seven classical class I (A, B and C) and class II (DPB1, DQA1, DQB1 and DRB1) loci. This research extends our previous results showing correlations between genetic and geographical distances among populations [6,23], as it reveals specifically a correlation between HLA genetic diversity and geographical distance from East Africa for a number of populations from all continents at both class I and class II loci. Like Prugnolle et al. , we thus made the preliminary assumption that modern humans migrated from this region, which harbours a very high genetic diversity [41–43], to expand worldwide, as suggested by a number of population genetics studies [38,44–49], a scenario often referred to as ‘Out-of-Africa’. A testable prediction resulting from this model is that modern human populations would have lost genetic diversity through a very large number of small founder effects during their geographical expansion , as proposed on the basis of the observation of decreasing diversity from East Africa at neutral autosomal microsatellites [38,46]. This is also what we found in this study, in agreement with Prugnolle et al.  for the three HLA class I loci HLA-A, -B and -C, and for the first time for the four HLA class II loci HLA-DRB1, -DQA1, -DQB1 and -DPB1. On the other hand, distance from East Africa only explains between 12 and 35 per cent of the genetic variation when estimated by the expected heterozygosity H, and from 18 to 43 per cent when estimated by the allelic richness ar at each locus, the latter decreasing faster than the former with the distance from East Africa (table 2). This is much less than the 85 per cent of variance explained by the neutral markers mentioned above . We also checked that these correlations were not completely induced by the presence of small-sized Amerindian or Taiwanese populations, which are very distant geographically from East Africa and which would have undergone particularly strong bottlenecks and/or rapid genetic drift during their colonization of America or their isolation in Taiwan, respectively. When Amerindians and Taiwanese are removed from the analysis, the correlation between allelic richness ar and distance from Africa remains significant at all loci except DQA1 and DQB1 (electronic supplementary material, table S9). The same results are obtained when genetic diversity is estimated by the heterozygosity H (instead of the allelic richness ar), except for DPB1 whose correlation becomes non-significant. This last result is somewhat surprising because HLA-DPB1 is generally considered as the most neutral among all HLA loci, with the highest level of genetic differentiation among populations (ΦST = 14.5%, compared with 6–9% for the other loci except DQA1, with 13.1%) , and is thus, in principle, particularly useful to infer human peopling history. However, such as HLA-DQA1 and -DQB1, this locus exhibits a low level of genetic diversity in most human populations (table 1) and cases of significant deviations from selective neutrality towards an excess of homozygotes are sometimes observed : in this case, to remove the most distant and less diversified populations from the data would have been sufficient to erase some direct signatures of human colonization, at least when measured by the slow decrease in heterozygosity with geographical distance from East Africa. On the other hand, these traces are still detected for HLA-DPB1 when allelic richness instead of heterozygosity is used. Hence we may say that the genetic patterns of at least five of the seven HLA loci investigated in this study, i.e. HLA-A, -B, -C, -DRB1 and -DPB1, are compatible with an expansion model of modern humans from East Africa. The important role of geographical migrations and demography in shaping HLA global genetic structure confirms that HLA markers can be used for inferring human settlement history , even if natural selection may distort such patterns in some particular situations, e.g. as it has been shown for HLA-DRB1 across the geographical barrier of the Strait of Gibraltar .
HLA-DQA1 and -DQB1, for which the correlation between genetic diversity (estimated by both ar and H indices) and geographical distance from East Africa vanishes out in the absence of populations strongly submitted to genetic drift, both show a decreasing level of genetic diversity with increasing pathogen richness (only -DQB1 is statistically significant), i.e. an extremely surprising result. This may indicate a very peculiar mode of natural selection which does not correspond to the classical PDBS hypothesis. Actually, even though the correlation between HLA genetic diversity (either allelic richness or expected heterozygosity) and pathogen richness is significant for HLA-B and -DQB1, when either all pathogens or viruses only are considered, and for HLA-A, when only viruses are tested, it loses its significance when Amerindian and/or Taiwanese populations are removed from the worldwide dataset. The PBDS model is thus not strongly sustained by this approach.
A noteworthy result is the inverse relationship found between these variables for different HLA class I and class II loci (table 3). HLA class I loci (A, B and C) tend to exhibit a positive relationship between genetic diversity and pathogen richness, suggesting heterozygote advantage as the main mechanism of pathogen-mediated selection, in agreement, although weakly, with the PDBS model. On the other hand, HLA class II loci (actually DQA1, DQB1 and DRB1) tend to reveal a negative relationship, indicating that other mechanisms may be at work. Actually, this negative relationship for class II molecules is much more pronounced for HLA-DQB1 than for HLA-DRB1, HLA-DQA1 lying in between. Functional differences between HLA class I and class II molecules probably explain these divergent results, although specific mechanisms remain to be investigated. A tentative explanation may be found in the different way HLA class I and class II molecules bind antigenic peptides. Class I-bound peptides are short (approx. 8–10 amino acids) and completely anchored with a high binding affinity within the PBS, which is composed, as mentioned above, of a portion of the single α chain. On the contrary, class II-bound peptides are long (approx. 12–25 amino acids), incompletely inserted (with side chains lying outside) in the peptide-binding groove, which in this case is composed of combined portions of the two chains α and β. Class II molecules thus display a lower specificity (or restriction) and affinity of peptide-binding, and, among them, HLA-DQ heterodimers are likely to be the more promiscuous [50,51]. These observations may explain our findings. Indeed, for class I molecules, a high allelic diversity of the α chain, as attested by the mean allelic richness of each locus presented in table 1, would be advantageous in a virus-rich environment because each virus-derived peptide would need a very specific binding groove to be presented to CD8+ T cells. Actually, this may also be the case for HLA-DRB1 whose allelic variation in human populations falls within the range of the values found for HLA class I loci (table 1), and for which more than 1000 alleles coding for almost 800 different proteins are currently known (see http://hla.alleles.org/nomenclature/stats.html). Note also that unlike for the other class II molecules, the α chain of DR αβ heterodimers is almost monomorphic (only two subtypes, DRA*01:01 and DRA*01:02, differing by non-synonymous substitutions are known, see http://hla.alleles.org/class2.html); therefore, variation at the DR β chain would be crucial to guarantee specific peptide presentations in a manner similar to the class I loci. Another argument is that human populations generally exhibit highly divergent HLA-A, -B, -C and -DRB1 alleles in terms of pairwise nucleotidic differences (with often more than 20 differences among DNA sequences at exon 2 (and exon 3 for class I)), suggesting that these loci may have evolved according to a model of asymmetric balancing selection whereby heterozygotes for molecularly distant alleles would have a higher fitness , a hypothesis that is worth testing in future research when DNA sequence data for those loci are available. The decrease of genetic diversity with pathogen richness found for HLA-DRB1, whose slope is actually much less pronounced than for HLA-DQA1 and -DQB1 despite the much higher number of populations considered (table 3), may then be irrelevant.
By contrast, a limited number of DQ αβ heterodimers are found at high frequencies in human populations worldwide . Previous results have shown that particular DQA1 and -DQB1 allelic combinations lead to unstable DQ molecules [53,54], suggesting that these loci are submitted to purifying selection owing to structural constraints, with the possible consequence to slow down the rate of nucleotidic divergence among their alleles. This may explain why DQB1 alleles are more related to each other from a molecular point of view than the alleles of other loci (except DPB1) . A paradox that has yet recently been underlined is that whereas different HLA-DQ αβ heterodimers exhibit highly divergent peptide-binding motifs, they share largely overlapping peptide-binding repertoires . A plausible explanation is that HLA-DQ PBS would play a minor role in peptide-binding compared with lateral interactions involving side-chains. The loss of PBS role in antigen recognition (i.e. recognition degeneracy like described by Stoffels & Spencer ) would have been essential for HLA-DQ to guarantee immune protection against a large variety of pathogens despite the existence of only a few stable αβ heterodimers. A further conclusion is that HLA loci submitted to different selective forces—balancing selection in the case of HLA-A, -B, -C and -DRB1, and purifying selection in the case of HLA-DQA1 and -DQB1—would have followed distinct evolutionary strategies to provide efficient immune protection in pathogen-rich environments.
The present study brings new evidence that molecular variation at both class I and class II HLA loci, with the possible exception of HLA-DQA1 and -DQB1, reveals significant signatures of past migrations of modern humans throughout the world, as a general pattern of decreasing genetic diversity with increasing geographical distance from East Africa is confirmed for three HLA class I loci (A, B, C) and is shown for two HLA class II loci (DRB1, DPB1), even when genetic drift effects owing to remote populations are controlled. In addition to previous evidence suggesting that these polymorphisms mainly evolve under the influence of geographical and demographic expansions of human populations, these results indicate that most HLA markers are useful tools to infer migration's history. A small part of the HLA genetic variation may also be explained by a response to pathogen richness in different environments for HLA-B (compatible with other studies suggesting the strongest selection for this locus [56–58]), and, to a lesser extent, HLA-A, in agreement with the PDBS model, although this effect is no longer significant when Amerindian and Taiwanese populations are excluded from the data. The present study also describes for the first time the relationship between genetic diversity and pathogen richness at the HLA class II loci. The most surprising result is the highly significant, negative correlation found for HLA-DQB1, although statistical significance disappears again when Amerindians are excluded. The comparisons of the different HLA loci for their amount of genetic diversity observed in human populations, their peptide-binding characteristics and the observed relationships between their genetic diversity and pathogen richness led us to suggest that they followed distinct evolutionary strategies in pathogen-rich environments: whereas HLA-A, -B, -C, and probably -DRB1, accumulated allelic diversity to ensure an efficient immune response in such environments, HLA-DQA1 and DQB1 relaxed the restriction of their PBS to maintain their protective role against pathogens despite a strong selective pressure against the formation of a large variety of DQ αβ heterodimers. These hypotheses should of course be considered as tentative, and both more immunogenetic investigation, e.g. to assess peptide-binding specificities or repertoire overlaps between class I and class II loci , and more robust biostatistical results are needed to better understand the molecular evolution of the complex HLA polymorphism.
We would like to thank Stéphane Buhler for helping us with the Gene[VA] large worldwide HLA database, as well as Prof. Walter Reith and Daniel Pinschewer (University of Geneva) for useful discussions. This work received financial support from the Swiss National Science Foundation (SNF, Switzerland) grants no. 3100A0—112651 and 31003A—127465 (A.S.M.) and the ESF (Europe) COST grant of Action BM0803 ‘HLA-NET’ (A.S.M.).
One contribution of 14 to a Discussion Meeting Issue ‘Immunity, infection, migration and human evolution’.
- This journal is © 2012 The Royal Society