Royal Society Publishing

Recombination rate variation and speciation: theoretical predictions and empirical results from rabbits and mice

Michael W. Nachman , Bret A. Payseur

Abstract

Recently diverged taxa may continue to exchange genes. A number of models of speciation with gene flow propose that the frequency of gene exchange will be lower in genomic regions of low recombination and that these regions will therefore be more differentiated. However, several population-genetic models that focus on selection at linked sites also predict greater differentiation in regions of low recombination simply as a result of faster sorting of ancestral alleles even in the absence of gene flow. Moreover, identifying the actual amount of gene flow from patterns of genetic variation is tricky, because both ancestral polymorphism and migration lead to shared variation between recently diverged taxa. New analytic methods have been developed to help distinguish ancestral polymorphism from migration. Along with a growing number of datasets of multi-locus DNA sequence variation, these methods have spawned a renewed interest in speciation models with gene flow. Here, we review both speciation and population-genetic models that make explicit predictions about how the rate of recombination influences patterns of genetic variation within and between species. We then compare those predictions with empirical data of DNA sequence variation in rabbits and mice. We find strong support for the prediction that genomic regions experiencing low levels of recombination are more differentiated. In most cases, reduced gene flow appears to contribute to the pattern, although disentangling the relative contribution of reduced gene flow and selection at linked sites remains a challenge. We suggest fruitful areas of research that might help distinguish between different models.

1. Introduction

The geographical context and genetic details of how new species arise have been major topics of evolutionary research. Based on geographical patterns of phenotypic variation in birds, Mayr [1] argued that geographical isolation is a common first step in the origin of species. Owing largely to the influence of both Mayr [1] and Dobzhansky [2], allopatric models of speciation have dominated ideas about how new species arise for much of the last 70 years. Over the last dozen years, however, there has been a renewed interest in the possibility that speciation may occur in the presence of gene flow.

This renewed interest in ‘speciation with gene flow’ comes from at least three places. First, recent theoretical models have demonstrated that selection can drive speciation in the face of gene flow [35]. Second, there are now quite a few detailed empirical studies where geographical isolation seems an unlikely explanation for observed patterns, including work on true fruit flies [6,7], cichlids [8], palms [9], sticklebacks [10] and many others (reviewed by Bolnick & Fitzpatrick [11]). In these examples, natural or sexual selection is thought to have driven changes that led to the origin of new species in the absence of geographical isolation. Third, new statistical tools enable us to measure gene flow between recently diverging populations that are not at migration–drift equilibrium and to distinguish this gene flow from incomplete lineage sorting as causes of shared variation [1217]. Some of these methods have been used quite widely. Although a majority of 49 studies that used an isolation-with-migration (IM) model yielded no evidence of gene exchange, many detected significant gene flow [18]. Finding evidence of gene flow does not mean that speciation occurred in sympatry. There are a range of other possibilities, including parapatric models, allopatric models followed by secondary contact and a variety of more complicated scenarios.

Some taxa may have experienced multiple periods of contact and isolation as ranges expanded and contracted. Periods of contact can offer opportunities for gene exchange, provided that reproductive isolation is not complete. Determining the timing of gene flow is a challenging task [19]. Finding evidence of gene flow also does not mean that speciation was necessarily driven by selection. Mutations that arise and fix (either through drift or selection) in allopatric populations can lead to negative epistatic interactions upon secondary contact, as suggested by Bateson [20], Dobzhansky [2] and Muller [21] (BDM incompatibilities). Such BDM incompatibilities are expected to restrict gene flow, but if some hybrids survive and reproduce, then some genomic regions may be permeable to gene flow, even as other regions differentiate [22].

A key challenge now is to understand the conditions under which gene flow may occur as young taxa diverge. Can we make explicit predictions about which genomic regions are likely to be important in the early stages of speciation? The rate of recombination is a central component of models used to interpret variation in levels of differentiation across the genome. Here, we review speciation models and population-genetic models that invoke variation in recombination rate. We outline specific predictions that follow from these models. We then review methods for evaluating these predictions. Finally, we assess these models in the light of empirical data from rabbits and mice.

2. Recombination rate variation and models of differentiation

(a) Speciation models with gene flow

Ten years ago, several authors proposed the idea that chromosomal rearrangements that distinguish recently diverged taxa might facilitate the origin of reproductive isolation in the face of hybridization [2328]. The rationale is that hybrids formed between taxa that differ by chromosomal rearrangements, such as inversions, will experience suppressed recombination in rearranged regions of the genome. Genes contributing to reproductive isolation may accumulate in such regions and may continue to diverge in the face of hybridization. These models are therefore fundamentally genic models of speciation (in contrast to earlier chromosomal models of speciation, which depend on the underdominance of the chromosomal rearrangements themselves; [29]). These models predict that the genomes of recently diverged species will be divided into high-differentiation and low-differentiation regions. For example, in crosses between Drosophila persimilis and Drosophila pseudoobscura, genes contributing to reproductive isolation map preferentially to the few inverted regions of the genome [24], and these regions also show greater genetic differentiation than collinear regions of the genome [30]. The models of Noor et al. [24], Rieseberg [25] and Navarro & Barton [23] differ in detail but share the key feature that chromosomal rearrangements suppress recombination when heterozygous.

Even between taxa with collinear genomes, genes contributing to isolation are expected to accumulate in regions of suppressed recombination [22,26] and consequently, gene flow is expected to be reduced in such regions. It is important to recognize the role of selection in this situation: gene flow is reduced because ‘isolation alleles’ are selectively removed when introduced into the sister species. Because these models are concerned with the origin of reproductive isolation between incipient species, we refer to them collectively as ‘speciation models’.

(b) Population-genetic models without gene flow

In contrast to these speciation models, there are a number of models that seek to explain how selection at linked sites can affect levels and patterns of genetic variation within a single population. We refer to these as ‘population-genetic models’. Selection in regions with reduced recombination can reduce levels of genetic variation within populations in two primary ways. First, a new beneficial mutation may quickly rise in frequency owing to positive selection, and result in the associated fixation of linked neutral sites (i.e. ‘genetic hitchhiking’; [31]). Positive selection may also act on standing variation, recurrent mutations or alleles introduced by migration (i.e. ‘soft sweeps’; [3234]), and in this case, the reduction in levels of variation at linked sites is expected to be weaker. Second, purifying selection against new deleterious mutations may reduce levels of variation at linked sites through a process termed background selection [35]. Many studies have demonstrated that genomic regions with reduced recombination have lower levels of genetic variation [3638], though distinguishing between the potential causes of these patterns has proved remarkably difficult [39].

Regardless of the cause of reduced variation in regions of low recombination within populations, this pattern will lead to increased estimates of differentiation between populations when statistics such as FST are used [40,41], even without gene flow. Because FST is based on a comparison of within-population diversity to between-population diversity, anything that reduces the former will inflate FST [41]. A negative correlation between FST and recombination rate has been observed in empirical studies ranging from a handful of loci [4244] to the whole genome [45].

(c) Predictions from models

The earlier-mentioned models can be used to generate explicit predictions about patterns of genetic variation and differentiation in the early stages of divergence. Both the speciation models and the models of selection at linked sites predict that differentiation will be greater in regions of low recombination. However, differentiation can be measured in a variety of ways, and there are subtle differences that may provide a means of distinguishing between alternative models (table 1).

View this table:
Table 1.

Alternative predictions from models of speciation with gene flow and population-genetic models of selection at linked sites for patterns of variation within and between species in genomic regions of low- and high-recombination rates.

The key prediction from all three speciation models is that gene flow will be lower in genomic regions experiencing less recombination. Thus, fewer shared polymorphisms are expected in such regions, and levels of differentiation should be greater. Figure 1a shows gene genealogies in regions of low and high recombination, in the absence of genetic hitchhiking or background selection. Gene flow is assumed to occur in regions of high recombination but not in regions of low recombination because of linkage to genes involved in isolation. Under these conditions, nucleotide diversity (π) is expected to equal 4Neμ at equilibrium in the low-recombination region, while π will be larger in the high-recombination region. If the lineages have been separated for a short time, then ancestral polymorphisms may still be segregating (not shown) and thus populations may not be at equilibrium. Nonetheless, gene flow is still expected to increase levels of nucleotide diversity. Without gene flow, net nucleotide divergence, Da [46], provides an estimate of 2μt, where μ is the neutral mutation rate and t is the time of separation of the lineages. With gene flow, Da provides an underestimate of this quantity. Similarly, Dxy [46]—the average number of pairwise differences between alleles sampled from the two populations—provides an estimate of 2μt + 4Neμ, where Ne is the ancestral population size, in cases without gene flow, but provides an underestimate of this quantity in cases with gene flow.

Figure 1.

Comparison of expected patterns of variation in regions of low recombination and regions of high recombination under (a) models of speciation with gene flow and (b) under population-genetic models of selection at linked sites without gene flow. See also legend to table 1.

A key prediction from models of selection at linked sites is that positive or negative selection will influence patterns of genetic variation at linked sites more in regions of low recombination than in regions of high recombination. Figure 1b shows gene genealogies in regions of low and high recombination, in the absence of gene flow but in the presence of selection at linked sites due to genetic hitchhiking or background selection. In low-recombination regions, selection at linked sites is expected to reduce levels of nucleotide variation, whereas in regions of high recombination, such effects will be negligible. Most theoretical treatments of the effects of selection at linked sites on neutral differentiation have focused on FST. Using simulations, Charlesworth et al. [47] showed that background selection increases FST between a pair of populations connected by a low degree of gene flow. The effect is primarily caused by the reduction in diversity within populations. Hu & He [48] used two- and three-locus models to demonstrate that the elevation in FST generated by background selection also applies with an arbitrary number of populations (in the island model) and is closely tied to levels of linkage disequilibrium (LD) between neutral and deleterious alleles. Slatkin & Wiehe [49] showed that genetic hitchhiking causes transient increases in FST by distorting neutral allele frequencies. It should be noted that another contributing factor to the positive correlation between within-population diversity and recombination rate—the association of recombination with mutation [50]—predicts no correlation between FST and recombination rate.

The effects of selection at linked sites on other measures of differentiation have received less attention. As a consequence of reduced nucleotide diversity, Da will be increased in regions of low recombination relative to regions of high recombination. Importantly, however, Dxy is expected to behave differently under models of selection at linked sites compared with models of speciation (table 1 and figure 1) [51]. Gene flow will reduce Dxy in regions of high recombination (figure 1a). Selection at linked sites will generally not affect Dxy, unless selection is also operating in the ancestral population, in which case the time to coalescence for alleles from the two daughter populations will be shorter in regions of low recombination, thus reducing Dxy in such regions. It remains to be seen whether measures of differentiation developed to reduce dependence on within-population diversity [52,53] show different signals under speciation with gene flow and selection at linked sites.

Table 2 provides additional tests that may help us to distinguish speciation models from models of selection at linked sites as explanations for increased differentiation in regions of low recombination. Several newer methods for detecting gene flow, described below, may be useful for identifying the expected increase in introgression in regions of high recombination under speciation models. In addition, some tests for selection may help identify the expected signatures of selection in low-recombination regions. It is important to bear in mind, however, that some traditional tests of selection, which can be quite powerful in other situations, may be uninformative in this context. For example, a lower ratio of polymorphism to divergence is expected in low-recombination regions under models of selection at linked sites as well as under models of reduced gene flow. Thus, rejection of the null model using the Hudson–Kreitman–Aguade test [54] does not help us to distinguish between these competing sets of models.

View this table:
Table 2.

Tests that may help distinguish between speciation and population-genetic models as explanations for greater differentiation in genomic regions with low recombination.

3. Methods for measuring introgression

Assuming that migration follows an island model and that migration–drift equilibrium has been reached, the number of migrants per generation (Nm) can be related directly to levels of differentiation from the expectation FST = 1/(4Nm + 1). However, recently diverged lineages are generally not expected to be at equilibrium [55]. Under these conditions, connecting patterns of differentiation to levels of gene flow is difficult since shared polymorphism may arise as a consequence of migration or from unsorted polymorphisms that remain from the ancestral population. Several solutions to this problem have emerged in the last decade.

The most widely used set of methods were developed by Hey, Nielsen and Wakeley, and are known as IM models [1214]. The original model of Nielsen & Wakeley [12] includes a single ancestral population that splits into two descendent populations which exchange genes. The model includes six parameters: three for population size (the ancestral population and each contemporary population), divergence time and migration rates into each population. Parameters are estimated jointly in the model using a Markov chain Monte Carlo (MCMC) simulation that incorporates stochastic variation in gene genealogies. The method provides estimates of parameter values as well as tests of alternative models in a likelihood framework. Subsequent extensions to this basic model now allow the inclusion of multiple loci, relax the assumption of constant population size and permit the examination of more than two populations [13,14,56,57].

Another method for estimating gene flow in a non-equilibrium context uses the joint allele-frequency spectrum for two or more populations. For example, Gutenkunst et al. [16] used forward simulations with a diffusion approximation to derive demographic parameters for four human populations in a likelihood framework.

A third approach to detecting gene flow under non-equilibrium conditions makes use of specific patterns in haplotype structure or LD. Machado et al. [58] pointed out that LD should be higher between linked single nucleotide polymorphisms (SNPs) introduced by recent gene flow, since they will be younger in the population, on average, than SNPs that remain from the ancestral population. In this test, the levels of LD for shared SNPs are compared with the levels of LD for SNPs that are exclusive to each population. More recently, Pool & Nielsen [59] developed a simulation method based on the erosion of migrant tract lengths. This method allows migration parameters to be estimated in a likelihood framework.

Another approach, approximate Bayesian computation (ABC), combines Bayesian inference with coalescent simulations in which the data are represented by a set of summary statistics [6062] (reviewed by Beaumont [63]). Because it uses summaries of the data, ABC is computationally fast and can handle large datasets and complex demographic scenarios. ABC methods are highly flexible, requiring only the ability to simulate datasets under models of interest. Posterior distributions of parameters, such as rates of gene flow, can be reconstructed using a variety of techniques, including rejection sampling [61], MCMC without likelihoods [64] or local-linear regression of simulated parameter values on simulated summary statistics [62].

Finally, analyses of clinal patterns in hybrid zones provide information about levels of introgression for different loci [65]. In general, for tension zones, cline width is proportional to dispersal rate and inversely proportional to the strength of selection acting on a locus. Thus, loci on which selection is strong will introgress little and have narrow clines, whereas loci on which selection is weak will introgress more and have wide clines. In recently formed hybrid zones, the signature of ongoing gene flow may dominate that of selection at linked sites, making hybrid zones good places to examine the role of recombination in models of speciation with gene flow. Patterns observed in hybrid zones may reflect introgression over different time scales than patterns observed in allopatric populations.

4. Empirical patterns in rabbits and mice

The predictions outlined above can be tested in two groups of mammals with recently separated subspecies that hybridize in nature, rabbits and mice.

(a) Rabbits

The European rabbit (Oryctolagus cuniculus) consists of two major subspecies: Oryctolagus c. cuniculus—which is distributed in the northeastern portion of the Iberian peninsula and southern France—and Oryctolagus c. algirus—which is distributed throughout the southwestern part of the Iberian peninsula (figure 2a). These two lineages are believed to have diverged in allopatry approximately two million years ago [66]. Throughout the Pleistocene, they likely underwent periods of isolation and secondary contact as climatic changes allowed for range contraction and expansion. Today, their ranges meet in a zone that runs diagonally across the Iberian peninsula from the northwest to the southeast. Phenotypic differences between the subspecies are slight [67].

Figure 2.

(a) Geographical distribution of rabbit subspecies in the Iberian peninsula: Oryctolagus cuniculus algirus (blue) and Oryctolagus cuniculus cuniculus (red). The approximate location of the hybrid zone is shown in purple. (b) Examples of gene genealogies in rabbits for one X-linked locus (SHOX, FST = 0.907) and two autosomal loci (GK5, FST = 0.120 and TIAM1, FST = 0.008). Colours correspond to subspecies of origin.

Genetic variation within and between the two subspecies has been studied using mtDNA [66,68], serological typing [69], allozymes [70], microsatellites [71] and DNA sequences of autosomal and X-linked loci [7274].

Patterns of genetic differentiation vary widely among loci (figure 2b). Deep haplotype divergence—with two major clades—is observed at many loci. For example, the mitochondrial cytochrome b gene contains two major lineages (one in each subspecies) with a sequence divergence of 11.9 per cent [66]. Similar patterns of deep divergence between subspecies are seen for the Y chromosome [75], most X-linked loci and many autosomal loci [72,74]. Despite this strong differentiation, FST varies from nearly 0 to 1 when all surveyed loci are considered. In general, high FST values are seen at loci that show two divergent haplogroups corresponding to the two subspecies (e.g. SHOX, figure 2b). Loci with low FST, however, are of two sorts. Some loci also have two divergent haplogroups, but both haplogroups are seen in both subspecies (e.g. GK5, figure 2b), a pattern consistent with a period of isolation followed by gene flow [74]. Other loci with low FST do not have two divergent haplotypes (e.g. TIAM1, figure 2b), a pattern that could reflect greater mixing and recombination after secondary contact, unsorted ancestral polymorphism or some combination of both.

Several studies have attempted to disentangle the relative contributions of unsorted ancestral variation from gene flow as explanations for shared polymorphisms between the rabbit subspecies. The rate of lineage sorting for neutral genes is a function of the effective population size [76]. Assuming a sex ratio of 1, mitochondrial and Y-linked genes experience an Ne that is one-fourth as large as Ne for autosomal genes, whereas X-linked loci experience an Ne that is three-fourths that of autosomal loci. As a result, we might expect patterns of differentiation to be the highest for mtDNA and Y-linked genes, intermediate for X-linked genes and lowest for autosomal loci. In fact, this is what is observed: FST for the Y is 0.93 [77], FST for the mitochondrial Cytb gene is 0.83 [66], average FST for 27 X-linked loci is 0.45 [74] and average FST for 17 autosomal loci is 0.15 [74]. At face value, this suggests that much of the variation among loci in patterns of differentiation might be accounted for simply by differences in rates of lineage sorting. But the situation is more complicated. Rates of recombination also vary among the Y, mtDNA, X and autosomes. The mitochondrial genome and most of the Y chromosome are effectively free of recombination, and the X chromosome experiences less recombination than the autosomes (since it does not recombine in males outside of the pseudo-autosomal region). Thus, if regions of lower recombination have lower rates of gene flow, similar patterns might arise even without differences in rates of lineage sorting. A good test of this idea would come from comparisons among loci on the same chromosome that experience different rates of recombination. Unfortunately, the rabbit genetic map is insufficiently detailed to provide accurate estimates of the local recombination rate for different genomic regions.

Without knowledge of recombination rates, loci near centromeres of metacentric chromosomes can be compared with loci in the middle of chromosome arms. In many organisms, recombination is suppressed near the centromeres of metacentric chromosomes [78]. Two studies have taken this approach in rabbits. In the first, two loci near the centromere were compared with two loci far from the centromere on the X chromosome [72]. The centromeric loci had an average FST of 0.75, whereas the telomeric loci had an average FST of 0.02. In the second study, five autosomal loci near centromeres were compared with five autosomal loci near telomeres [73]. All telomeric loci showed low levels of differentiation (mean FST = 0.06), whereas three centromeric loci showed high differentiation (mean FST = 0.47 for these three loci) and two showed little differentiation (mean FST = 0.06 for these two loci). Importantly, levels of LD were significantly higher within subspecies for loci near centromeres than for loci near telomeres, suggesting that these two types of loci experience low- and high-recombination rates, respectively.

Is the pattern of higher FST at centromeric loci driven by reduced gene flow or by selection at linked sites? Figure 3a compares FST and the relative node depth (RND) for these loci. RND is Dxy divided by Dxy to an outgroup, in this case Lepus granatensis; this ratio corrects for differences in mutation rate among loci [79]. Both FST and RND are higher for the genes near centromeres, consistent with greater gene flow at telomeric loci than at centromeric loci, as shown in figure 1a. In addition, using likelihood ratio tests, IM models incorporating migration [56] are a significantly better fit to the data than models without migration [74]. Moreover, estimates of gene flow at individual loci using IM are higher for loci near telomeres than for loci near centromeres [73]. These observations suggest that reduced gene flow contributes to the higher differentiation at loci with lower recombination rates.

Figure 3.

(a) Comparison of patterns of differentiation between centromeric loci (presumed to experience low recombination) and telomeric loci (presumed to experience high recombination) between subspecies of rabbits. (b) Comparison of patterns of differentiation between low-recombination loci and high-recombination loci between subspecies of house mice.

Does selection at linked sites also contribute to this pattern? Most loci harbour an excess of rare alleles [73], with negative values of Tajima's D [80] and Fu & Li's D [81]. The fact that this pattern is seen at nearly all loci is more consistent with a demographic explanation (such as a population expansion) than with a selective explanation. Most of these negative values are not significant, although a few are significantly lower than expected under a neutral, equilibrium model [73]. Importantly, however, the average value of Tajima's D or of Fu and Li's D is not lower at centromeric loci than at telomeric loci [73], as might be expected under a simple hitchhiking model. It is important to realize that these results provide only a weak test of selection, and do not rule out a role for more complicated forms of positive selection or a role for background selection.

Finally, the rabbit subspecies do form a hybrid zone in the central portion of the Iberian peninsula. Analyses of clinal patterns of variation for loci near telomeres and for loci near centromeres could provide additional insight into the role of gene flow in preventing differentiation in some genomic regions. Speciation models predict that centromeric loci should have steeper clines than telomeric loci.

(b) House mice

House mice consist of three subspecies: Mus musculus musculus, Mus musculus domesticus and Mus musculus castaneus (also referred to as species by some authors). The subspecies diverged from one another within a short period of time within the last 500 000 years [82]. Their phylogenetic relationships are difficult to resolve, although available evidence suggests that M. m. musculus and M. m. castaneus are sister subspecies [83]. Mus musculus domesticus is found in the Mediterranean region and Western Europe, M. m. musculus occurs in Eastern Europe and Northern Asia, and M. m. castaneus is found in southeast Asia (figure 4). A well-defined hybrid zone between M. m. domesticus and M. m. musculus occurs in central Europe, running from Denmark to the Black Sea. Mus musculus molossinus is found in Japan, and seems to be derived from hybridization between M. m. musculus and M. m. castaneus. House mice are commensal with humans [84]. In historical times, house mice primarily from Western Europe have been spread in association with humans throughout much of the world, including the Americas, Africa, Australia and many oceanic islands [85].

Figure 4.

(a) Geographical distribution of house mouse subspecies: Mus musculus domesticus (blue), Mus musculus musculus (red) and Mus musculus castaneus (yellow). Mus musculus molossinus (orange) is found in Japan and is believed to derive from hybridization between M. m. musculus and M. m. castaneus. Ranges reflect inferred distributions before expansions associated with humans during the last few hundred years. (b) Examples of gene genealogies in house mice for one X-linked locus (Ocrl, FST = 0.867), one Y-linked locus (Jarid1d, FST = 0.907) and one autosomal locus (Clcn6, FST = 0.434).

The genetic basis of speciation in house mice has been studied extensively both in the laboratory and in natural populations. Most of this work has focused on reproductive isolation between M. m. domesticus and M. m. musculus, though some studies have included comparisons with M. m. castaneus and M. m. molossinus. In the laboratory, crosses between M. m. musculus and M. m. domesticus (or inbred strains such as C57BL/6J, which are largely of M. m. domesticus origin) have revealed that hybrid males suffer from sterility or reduced fertility, whereas hybrid females are usually fully fertile or show only slight reductions in fertility [8690]. The only known hybrid sterility gene in vertebrates was recently identified in mice as Prdm9 [91]. Interestingly, this gene also underlies variation in fine-scale recombination rate in both mice and humans [9294].

Genetic differentiation in wild mice has been studied both in hybrid and in allopatric populations. The hybrid zone between M. m. musculus and M. m. domesticus is a result of secondary contact following the spread of mice into Europe within the last 3000 years [95]. Most loci in this hybrid zone display concordant clines, although inter-locus variation in cline width and midpoint is observed [96100]. Notably, the X chromosome and the Y chromosome show steeper clines than the autosomes in some transects, suggesting a role for these chromosomes in reproductive isolation [101,102]. At some autosomal loci, alleles from one subspecies are observed well into the range of the sister species, suggesting that gene flow between subspecies occurs across the hybrid zone [96,99].

Several studies have analysed multiple autosomal or X-linked loci in allopatric populations [82,103107]. As in rabbits, gene genealogies vary widely among loci in mice (figure 4b). At some loci, such as Ocrl on the X chromosome, the three subspecies form three fully sorted groups of haplotypes. Other loci, such as Jarid1d on the Y chromosome, have three divergent lineages, but some haplotypes from one subspecies cluster with a lineage that is otherwise restricted to a different subspecies (figure 4b). This pattern is similar to the pattern seen at loci such as Gk5 in rabbits (figure 2b), and is probably best explained by gene flow. Finally, at some loci the three subspecies are intermingled on the gene genealogy (e.g. Clcn6, figure 4b). As in rabbits, FST varies from nearly 0 to 1 among loci.

Several observations suggest that shared polymorphisms result at least partly from gene flow [59,82]. In comparisons between M. m. musculus and M m. domesticus, the X chromosome has fewer shared polymorphisms and higher FST than the autosomes [82,105,106]. Geraldes et al. [82] analysed sequence data from eight loci and found that likelihood ratio tests in an IM framework rejected a strict allopatric model of speciation in each of the three comparisons between pairs of subspecies. Moreover, estimates of gene flow in these models were higher for autosomal loci compared with X-linked loci. Lower gene flow on the X is also supported by hybrid zone studies, where loci on the X chromosome have narrower cline widths, on average, than do autosomal loci [96,101,108]. This general agreement between large-scale patterns of differentiation on the X versus autosomes in allopatric and hybrid populations suggests that reduced gene flow contributes to higher differentiation on the X chromosome. Importantly, this agrees well with numerous laboratory crosses that implicate much of the X chromosome in hybrid male sterility [87,88,109,110]. Thus, genes underlying reproductive isolation are associated with genomic regions showing greater levels of differentiation, as predicted by speciation models.

In contrast to rabbits, local recombination rates can be estimated in mice by comparing dense genetic maps to the reference genome sequence. Recombination rates vary across the genome [111,112], with rates in some regions differing among divergent strains [113]. Is differentiation between subspecies correlated with local recombination rate along the autosomes?

Takahashi et al. [104] analysed sequence data from 19 autosomal loci sampled in two inbred strains each of M. m. musculus, M. m. domesticus, M. m. castaneus and M. m. molossinus. Despite the small sample size (n = 2 for each subspecies), they found a significant negative correlation between FST and recombination rate, with recombination rate explaining about 40 per cent of the variation in levels of differentiation. This pattern was attributed to genetic hitchhiking or background selection, but the potential contribution of differences in levels of gene flow was not considered. FST in this analysis was based on the average among all four taxa, and patterns of differentiation between pairs of subspecies were not reported. Harr [105] analysed patterns of differentiation between inbred strains of M. m. musculus and M. m. domesticus for 10 000 SNPs that were ascertained in other samples and found no association between recombination rate and FST. Similarly, Teeter et al. [99] found no correlation between recombination rate and cline width for 39 markers in the M. m. musculus–M. m. domesticus hybrid zone.

Geraldes et al. [107] analysed sequence data from 27 autosomal loci in population samples of each of the three subspecies. In each of the three pairwise comparisons between subspecies, FST was higher for genes in regions of low recombination than for genes in regions of high recombination (figure 3b). In comparisons between M. m. castaneus and M. m. musculus or M. m. domesticus, RND was higher for the low-recombination genes compared with the high-recombination genes (figure 3b), consistent with greater gene flow at the high-recombination genes (figure 1a) and similar to the patterns seen in rabbits. In the comparison between M. m. musculus and M. m. domesticus, RND was lower for the low-recombination genes compared with the high-recombination genes, although differences in FST were not significant in this comparison. IM models incorporating migration are a significantly better fit to the data than models without migration in all three pairwise comparisons [82], and IM estimates of gene flow at individual loci are higher for high-recombination loci than for low-recombination loci [107]. These results show that M. m castaneus is more differentiated from M. m. musculus and from M. m. domesticus in regions of low recombination than in regions of high recombination and that differences in levels of gene flow account for some of this variation, in agreement with predictions from speciation models.

Does selection also contribute to greater differentiation in regions of low recombination? For each of the three subspecies, Tajima's D and Fu and Li's D are both lower on average for the low-recombination loci than for the high-recombination loci, although these differences are slight and are not significant [107]. Baines & Harr [103] compared patterns of DNA sequence variation on the X chromosome and on the autosomes in ancestral and derived populations of M. m. musculus and of M. m. domesticus, and found some evidence for positive selection on the X chromosome of derived populations. No comparable study has been conducted on ancestral and derived populations comparing regions of low and high recombination. Studies of selection in high- and low-recombination regions would be useful, especially in comparisons between M. m. castaneus and M. m. musculus or between M. m. castaneus and M. m. domesticus, where variation in FST is associated with variation in recombination rate.

5. Future prospects

Multiple models predict that genomic regions with little recombination will exhibit relatively high levels of differentiation in recently separated taxa and this prediction is supported by data from rabbits and mice. The ability to distinguish between speciation with gene flow and selection at linked sites as explanations for this pattern would be improved most rapidly by investing in two areas.

First, we need to identify those aspects of genetic variation that best separate the two processes. Speciation with gene flow and selection at linked sites are evolutionarily distinct phenomena, suggesting that careful comparisons will reveal the discordant signatures they leave in patterns of diversity. Analysis of simulated populations in which one or both processes are operating should lead to diagnostic combinations of existing summary statistics (e.g. measures of LD and the frequency spectrum), and could suggest new summary statistics. In addition to identifying more informative measures of variation, simulations should show whether the two models predict different genomic scales of differentiation. For example, we might expect recent gene flow to produce larger regions of high differentiation than selection at linked sites. Furthermore, simulations should help disentangle the relative contributions of recombination rate, selection and gene flow to genomic patterns of differentiation [114]. Simulations will also motivate inferences that are model-based, rather than relying on analytical theory with restrictive assumptions. Informative combinations of summary statistics could be used in an ABC framework to directly compare the support for speciation with gene flow and selection at linked sites in a dataset.

A second promising approach to distinguishing these models is to integrate studies of population differentiation with genetic studies of reproductive isolation phenotypes [24,115]. Speciation with gene flow specifically predicts that genomic regions with reduced recombination rates will contain genes responsible for reproductive isolation, whereas models of selection at linked sites make no such prediction. Comparing the genomic positioning of loci responsible for reproductive isolation phenotypes with the genomic patterning of differentiation would offer the added benefit of gauging the contributions of the phenotypes being studied to gene flow in nature.

Acknowledgements

We thank Pennie Liebig for help with the preparation of the manuscript. We thank M. Carneiro and A. Geraldes for comments on the manuscript. This work was supported by NSF and NIH grants to M.W.N. and by NSF and NIH grants to B.A.P.

Footnotes

References

View Abstract