Chromosomal inversions impact genetic variation and facilitate speciation in part by reducing recombination in heterokaryotypes. We generated multiple whole-genome shotgun sequences of the parapatric species pair Drosophila pseudoobscura and Drosophila persimilis and their sympatric outgroup (Drosophila miranda) and compared the average pairwise differences for neutral sites within, just outside and far outside of the three large inversions. Divergence between D. pseudoobscura and D. persimilis is high inside the inversions and in the suppressed recombination regions extending 2.5 Mb outside of inversions, but significantly lower in collinear regions further from the inversions. We observe little evidence of decreased divergence predicted to exist in the centre of inversions, suggesting that gene flow through double crossovers or gene conversion is limited within the inversion, or selection is acting within the inversion to maintain divergence in the face of gene flow. In combination with past studies, we provide evidence that inversions in this system maintain areas of high divergence in the face of hybridization, and have done so for a substantial period of time. The left arm of the X chromosome and chromosome 2 inversions appear to have arisen in the lineage leading to D. persimilis approximately 2 Ma, near the time of the split of D. persimilis–D. pseudoobscura–D. miranda, but likely fixed within D. persimilis much more recently, as diversity within D. persimilis is substantially reduced inside and near these two inversions. We also hypothesize that the inversions in D. persimilis may provide an empirical example of the ‘mixed geographical mode’ theory of inversion origin and fixation, whereby allopatry and secondary contact both play a role.
Examining the genomic landscape between closely related species can reveal areas of high divergence that are often targeted as potential signatures of speciation. Between some species, regions of high divergence appear to have been established and maintained purely by selection (reviewed in ). In others, establishment and maintenance of regions of high divergence appears facilitated by structural differences between species [2,3].
Chromosomal rearrangements such as inversions (an orientation reversal of a piece of DNA within a single chromosome) impact species divergence by reducing recombination in heterokaryotypes [2,4–6]. For paracentric inversions, which do not span a centromere, a single meiotic recombination event between heterokaryotypes results in acentric and dicentric products—both of which do not produce viable gametes (electronic supplementary material, figure S1). In Drosophila, acentric and dicentric products remain in the polar bodies in females (males, generally, do not recombine); therefore, paracentric inversions (i) do not affect fertility, (ii) are not structurally underdominant, (iii) effectively prevent recombination within the inverted region and (iv) are the most common type of inversion observed in Drosophila [7,8]. This suppressed effective recombination between heterokaryotypes of paracentric inversions may facilitate speciation by protecting locally adapted alleles inside the inversions from gene flow with other populations, allowing further divergence between two karyotypes (reviewed in earlier studies [9,10]). While theoretical [11–14] and empirical [15–21] support for this hypothesis exists, the maintenance of genomic divergence-within inversions over long time periods (many hundreds of thousands of generations), and the shape of elevated divergence-within inversions have been less explored empirically (but see [2–5]).
The diverging sister taxa of Drosophila pseudoobscura and Drosophila persimilis offer a suitable system for evaluating the impacts of inversions on genome-wide diversity and divergence. Drosophila pseudoobscura is a widespread species, ranging across western Canada, the United States and part of Central America, with a disjunct, allopatric subspecies in Colombia (D. pseudoobscura bogotana). Its sister species, D. persimilis, is restricted to the far west of Canada and the United States. The two D. pseudoobscura subspecies shared a common ancestor with D. persimilis approximately 500 000 years ago [22,23] and the two subspecies diverged from each other approximately 200 000 years ago [22,24]. With one exception, hybrid males from all possible crosses of the three species/subspecies are sterile, and a very small number of D. pseudoobscura–D. persimilis hybrids have been found in the wild [25,26]. In total, these species have six chromosome arms (three telocentric autosomes (chromosomes 2–4), one ‘dot’ autosome (chromosome 5) and a metacentric X chromosome with a left and right arm, XL and XR, respectively), and two inversions—one on the XL and one on the second chromosome—are fixed between D. pseudoobscura and D. persimilis . The inversions on the second chromosome and the XL are slightly greater than 7 Mb each (23% and 34% of each chromosome arm, respectively), and another inversion on the XR, which is polymorphic within D. persimilis, is 13 Mb (45% of the XR chromosome arm) . These three inversions are derived in D. persimilis [27,28], and D. persimilis that do not harbour the XR inversion exhibit sex-ratio distortion . Chromosome 3 segregates for more than 30 different inversion arrangements within D. pseudoobscura .
The three inversions differentiate between D. persimilis and D. pseudoobscura and encompass important alleles for maintaining reproduction isolation in the face of gene flow [15,16]. For hybrids of D. persimilis and D. pseudoobscura, reproductive traits such as male sterility, male mating success, female species preferences and hybrid inviability map to the XL and chromosome 2 inversions [15,16], and male courtship dysfunction localizes to the XR inversion as well as the fixed inversions . In the allopatric species pair, D. ps. bogotana and D. persimilis, no gene flow occurs, reproductive isolation alleles do not mix in collinear regions and alleles for reproductive isolation have accumulated outside of inversions [30–33]. In contrast, in the feebly hybridizing, parapatric D. pseudoobscura and D. persimilis, sterility of hybrid males localizes predominantly to inversions . Thus, it appears that inversions play an important role in maintaining species differences in sympatry .
Based on sequence analyses and genetic mapping data, we inferred that gene flow has homogenized the collinear regions of D. pseudoobscura and D. persimilis [2,4,5,34], but sequence divergence is maintained in and approximately 2.5 Mb outside of the chromosome 2, XL and XR inversions owing to reduced recombination [2,4–6,34]. Little reduction of divergence toward the centre of these inversions is observed [2,5,6]. However, these inferences are based on sparse sequence coverage (either in number of genomes or in coverage of genomes) rather than high-sequence depth comparisons.
Gene flow may erode divergence within inversions through the action of gene conversions and double crossovers. Small gene conversion events (approx. 200 bp ) are expected to act homogeneously across the inversion , except at regions extremely close to the inversion breakpoints (e.g. less than 17 kb ). Double crossovers between the karyotypes rarely encompass areas near breakpoints, and breakpoints remain diverged even when the centre of the inversion experiences gene flow [3,37–39].
Recent theoretical work has precipitated a need to re-examine divergence patterns across fixed inversions between D. persimilis and D. pseudoobscura. First, in some cases, theoretical simulations indicate that long-term maintenance of divergence in inversion-localized genes may wear away upon secondary contact through the action of gene conversion and double crossovers . Second, coalescent models suggest that linked neutral divergence between karyotypes is expected to be higher at breakpoints and near locally adapted alleles in the inversion, even for relatively old inversions . We examine the diversity and divergence patterns across fixed inversions for D. persimilis and D. pseudoobscura, in the light of this recent theoretical work and with more extensive sequence data than past studies, to evaluate divergence in inversion centres relative to breakpoints.
We also compare the divergence patterns of D. persimilis and D. pseudoobscura with divergence patterns of the two species to their closest outgroup, D. miranda. Drosophila miranda diverged from D. pseudoobscura and D. persimilis approximately 2 Ma [22,41,42]. This outgroup does not harbour the three inversions and cannot successfully hybridize with any of the other species in our study. By comparing the D. persimilis–D. pseudoobscura divergence with the divergence patterns to their closest outgroup, we draw inferences regarding the coalescence of the arrangements for each inversion.
We leverage new whole-genome shotgun sequences, originally sequenced for a companion study, to estimate pairwise differences in intergenic regions across collinear and inverted regions on chromosome 2, XL and XR. Specifically, our comparisons include two resequenced lines of D. persimilis, 10 resequenced lines of D. pseudoobscura and three resequenced lines of D. miranda. We measured nucleotide differences outside and within the inversions with special attention to the distance from the inversion breakpoints. Recombination suppression extending outside of the inversions has been more finely pinpointed  compared with past work ; therefore, the effect of inversion-associated recombination suppression can be more accurately measured.
2. Material and methods
For each line, DNA from 15 to 20 virgin female inbred flies was extracted using the GentraSystems PureGene DNA preparation (Minneapolis, MN, USA) and sequenced with Illumina Genome Analyzer IIx at the University of North Carolina-Chapel Hill High-Throughput Sequencing Facility (see electronic supplementary material, table S1 for type of read and amount of data obtained for each line). Illumina reads were aligned to the D. pseudoobscura reference genome v2.9 using bwa-0.5.5 (default alignment settings were used except the maximum number of gap extensions was set to 4) . Consensus assemblies for each resequenced line were generated using the bwa alignments and samtools 0.1.6 pileup . Default pileup settings were used except the theta parameter (error-dependency coefficient) in the maq consensus calling model was set to 0.9, and the number of haplotypes in the sample was set to 1 because each resequenced line was inbred. Each consensus sequence was filtered with custom python scripts. We excluded bases with read coverage less than four or with Phred-scale consensus mapping score of less than 30. We also excluded insertions and deletions that were supported by less than 70 per cent of reads and excluded 5 bp flanking the indel.
We estimated average pairwise differences for intergenic regions across collinear and inverted regions on chromosome 2 and the XL chromosome arm using two resequenced lines of D. persimilis and the reference D. persimilis genome, 10 resequenced lines of D. pseudoobscura and the D. pseudoobscura reference genome, and three resequenced lines of D. miranda. For the XR, we included all D. pseudoobscura and D. miranda lines, but only one resequenced line of D. persimilis and the reference D. persimilis genome. The other resequenced line of D. persimilis was a sex-ratio distortion line and does not contain the XR inversion. This sequencing effort was initially motivated by a companion study, which did not require even sampling across species. Intergenic regions are best suited for this assay, as opposed to intron or synonymous sites, because the distance between the inversion breakpoint and the nearest predicted gene is often very large . Bases were considered intergenic if no coding or intron regions were annotated in Flybase for D. pseudoobscura. Results from intron bases and fourfold degenerate sites are consistent with intergenic regions (see electronic supplementary material, figures S2–S7). Inversion breakpoints were defined as in Noor et al. .
All analyses were performed in jmp v. 8.0.2. We used Mann–Whitney U-test and Wilcoxon signed-rank matched pairs tests to examine whether average pairwise divergence between regions differed, and corrected for multiple-testing using sequential Bonferroni [45,46]. All statistics were performed on average pairwise divergence across (i) 10 and (ii) 100 kb non-overlapping windows. For all tests, chromosomes were divided into four regions: inside the inversion (IN), regions of recombination suppression outside but near the inversion (2.45–2.84 Mb; see [5,6]; NEAR), regions outside of the inversion and the region of recombination suppression (FAR) and breakpoints. The number of bases included in the breakpoint region was dependent on the window size. For the 10 kb window, we included two 10 kb windows on each side of the inversion. For the 100 kb window, we used one 100 kb window spanning the breakpoint. We explored various numbers of windows and performed the analysis with (i) ten 10 kb windows spanning each inversion breakpoint or (ii) five 100 kb windows spanning each inversion breakpoint. These added windows served only to obscure the divergence difference between the breakpoint and the inverted region and were not used in further analysis. We excluded 3 Mb proximal to the centromere and 1 Mb closest to the telomere; centromeric effects on diversity and divergence appear to extend farther into the genome. For the XL chromosome arm, the FAR region was only on the centromeric side of the inversion because any bases that qualified on the telomeric side were within 1 Mb of the end of the chromosome.
3. Results and discussion
For chromosomal speciation by suppressed recombination to be supported: (i) sister taxa must maintain fixed inversion differences, (ii) recombination must be suppressed within/near the inversion, and (iii) reproductive isolation alleles must localize to the inverted region . The D. pseudoobscura–D. persimilis system satisfies all requirements [6,15,16,47], and our study quantifies the genomic impacts of chromosomal speciation between D. pseudoobscura and D. persimilis. Our results suggest that within and near the inversions (especially, chromosome 2 and XL) divergence between heterokaryotypes has been maintained for approximately 2 Myr.
(a) Divergence patterns in and near inversions
Evidence from other systems, where chromosomal rearrangements have played a role in speciation, suggests that divergence is maintained predominantly very close to the inversion breakpoints [3,39]. Between D. pseudoobscura and D. persimilis for intergenic regions, breakpoints for the XL chromosome did not exhibit higher divergence relative to that observed inside the inversions. However, for both of the scales measured (10 or 100 kb windows; electronic supplementary material, tables S2 and S3), chromosome 2 and XR breakpoints exhibited qualitatively higher divergence in intergenic regions relative to that observed inside the inversions. The largest observed difference was in the 100 kb window analysis of chromosome 2; the median average divergence for breakpoint windows (one 100 kb window at each breakpoint, 0.032) was approximately 1.7 times the median average divergence inside the inversion (0.019). Likewise, breakpoints of the XR inversion harbour approximately 1.1–1.4 times the amount of divergence seen inside the inversion (100 and 10 kb windows, respectively). Power to assess significance was low in all cases, because the number of intervals which spanned the breakpoint was small, but overall, we do not observe a striking difference in divergence between the regions immediately spanning the breakpoints and regions further inside the inversion.
In some cases, gene flow during secondary contact between diverged karyotypes may erode divergence in the centre of inversions [36,48] if low levels of gene flux occur in heterokaryotypes. In D. pseudoobscura–D. persimilis heterokaryotypes, the rate of double crossovers within the inverted region is approximately 1 × 10−4 (measured over the XR inversion )—four orders of magnitude higher than values that could theoretically erode allele frequency differences between karyotypes under certain scenarios . However, for all three inverted regions between D. pseudoobscura and D. persimilis, average pairwise divergence in intergenic regions is significantly higher in rearranged versus FAR collinear regions (figures 1–3; electronic supplementary material, tables S2 and S3; p < 0.0001, Z > 5.748 in all cases for both 10 and 100 kb windows) even though these two species diverged several millions of generations ago.
Divergence also extends beyond the inversion boundaries into collinear regions [4,5], likely because recombination is suppressed in these regions as well [2,6]. For the XL and chromosome 2, we confirmed that divergence for NEAR regions was significantly higher than FAR regions (figures 1 and 2; electronic supplementary material, tables S2 and S3; p < 0.0001, Z > 5.428 in all cases for both 10 and 100 kb windows). For the XR inversion, NEAR regions were significantly more divergent than FAR regions only for the 10 kb window analysis before Bonferroni adjustment (p = 0.0331 before Bonferroni, Z = 2.131; figure 3; electronic supplementary material, tables S2 and S3). The putatively ancestral XR arrangement is still segregating in some populations of D. persimilis (homosequential to D. pseudoobscura); thus differences between IN, FAR, and NEAR genomic regions in the XR are expected to be less defined than the inversions on the XL and chromosome 2.
(b) Heterogeneity between inversions in magnitude of D. persimilis–D. pseudoobscura divergence
The D. pseudoobscura–D. persimilis divergence between the inversions follows the pattern: XL > 2 > XR. This pattern was suggested previously to indicate that the inversions arose and fixed sequentially—each capturing a different coalescence point and preventing further gene flow within the bounds of the inversion. In this scenario, gene flow continued for the remainder of the genome until a subsequent inversion arose and became established . Alternatively, the heterogeneous divergence pattern (XL > 2 > XR) seen across the inversions was posited the result of differential mutation pressures across the genome , as a similar divergence pattern across chromosomes was observed for D. pseudoobscura and D. miranda which are not separated by inversion. The working model for this system from Kulathinal et al.  is that the three inversions arose and the inversions on XL and chromosome 2 became fixed in the nascent D. persimilis lineage shortly after the separation of D. miranda–D. pseudoobscura–D. persimilis . Our data refine this hypothesis in several ways.
First, we found that divergence in inverted regions for both the XL and chromosome 2 between D. pseudoobscura and D. persimilis was greater than the divergence of either with putative outgroup species D. miranda (matched pairs for all sequential Bonferroni-corrected comparisons for both 10 and 100 kb windows: XL: p < 0.001 median average pairwise divergence across all windows in inversion ps–per = 0.0231, ps–mir = 0.0208, per–mir = 0.0212; chromosome 2: p < 0.042, ps–per = 0.0185, ps–mir = 0.0178 and per–mir = 0.0178). These results suggest that, within the XL and chromosome 2 inversions, D. persimilis and D. pseudoobscura share a deeper ancestor than they do with D. miranda. However, the D. pseudoobscura–D. miranda divergence is equal to the D. persimilis–D. miranda divergence (sequential Bonferroni-corrected p > 0.084 in all cases) which would argue that the three arose at nearly the same time. To account for this discrepancy, D. persimilis–D. pseudoobscura divergence would have had to increase within the inversions, perhaps at and near sites under divergent selection between D. persimilis and D. pseudoobscura (reviewed in ). While very speculative, this hypothesis is indirectly supported in that sites thought to be under less selective constraint (intron and fourfold degenerate sites; electronic supplementary material, figures S2–S7) exhibit this ‘overly elevated’ D. persimilis–D. pseudoobscura divergence relative to D. miranda to a lesser extent than putatively less-neutral intergenic regions.
The inversions also may coalesce deeper in evolutionary history than the remainder of the genome. Using the collinear FAR region on the centromeric side (FAR-C) for the XL and chromosome 2, we observed that the D. pseudoobscura–D. persimilis divergence is less than the divergence of either from D. miranda, indicating that gene flow has homogenized collinear regions relative to inverted regions (XL: ps–per = 0.0166, ps–mir = 0.0227, per–mir = 0.0219; chromosome 2: ps–per = 0.0150, ps–mir = 0.0191, per–mir = 0.0185, match pairs p < 0.0001, for all sequential Bonferroni-corrected comparisons for both 10 and 100 kb windows).
Second, the hypothesis that the inversions arose sequentially, and captured different time points during divergence of the species, can be examined with our data. While the D. pseudoobscura–D. persimilis divergence is heterogeneous among inversions, the D. pseudoobscura–D. persimilis divergence relative to the D. pseudoobscura–D. miranda divergence (which are not differentiated by these inversions) is also heterogeneous among inversions, and such a comparison accounts for heterogeneous mutation rates across chromosomes. The D. pseudoobscura–D. persimilis divergence relative to the D. pseudoobscura–D. miranda divergence is approximately 1.10 times higher within the XL inversion than within the chromosome 2 inversion and approximately 1.27 times higher within the XL inversion than within the XR inversion (all pairwise comparisons, p < 0.0001). Likewise, the D. pseudoobscura–D. persimilis divergence relative to the D. pseudoobscura–D. miranda divergence is approximately 1.17 times higher for the chromosome 2 inversion than the XR inversion. This result tentatively suggests a mechanism of a sequential origin of the inversions, with the inversions on the XL and chromosome 2 arising before XR, and possibly with the XL inversion arising before the chromosome 2 inversion. However, the XR inversion in D. persimilis remains polymorphic. Within-D. persimilis gene flow between arrangements on the XR is relatively substantial , and may be why the XR inversion appears much younger than the XL and chromosome 2 inversions. Thus, any accurate estimation of the timing of the XR inversion's origin is difficult. Likewise, it is possible that the apparent relative age of the inversions (XL > 2 > XR) is actually more a reflection of the relative time that each inversion fixed, not necessarily the timing of their origin; so, definitive conclusions regarding the sequential origin and fixation of these inversions are nearly impossible to make at this time.
Finally, because the XR inversion remains polymorphic and is not fixed within D. persimilis, we can assert that the timing of fixation of the XR inversion is secondary to the fixation of the chromosome 2 and XL inversions. The strong reduction in nucleotide diversity in the XL and chromosome 2 inversions of D. persimilis suggests that, while the inversions may have arisen long ago, the fixation of the XL and chromosome 2 inversions was more recent, probably near or since the D. persimilis–D. pseudoobscura–D. miranda split (sensu [14,48]). Within D. persimilis, nucleotide diversity is low over the regions encompassed in and near the XL (median diversity: FAR = 0.008, NEAR = 0.006 and IN = 0.005; Z > 3.174, Bonferroni-corrected p = 0.024; figures 1 and 2) and chromosome 2 inversions (median diversity: FAR = 0.010, NEAR = 0.008 and IN = 0.005; Z > 4.487, p < 0.001, in all cases), whereas a similar reduction in diversity is much weaker or non-existent for D. pseudoobscura (XL median diversity: FAR = 0.012, NEAR = 0.012 and IN = 0.012; chromosome 2 median diversity: FAR = 0.012, NEAR = 0.012 and IN = 0.011, all comparisons not significant; figures 1 and 2). Interestingly, the D. persimilis nucleotide diversity in and near the XR inversion is greater than the diversity in the region outside of the inversion (FAR = 0.001, NEAR = 0.003 and IN = 0.003; significant for 10 kb windows only, Bonferroni-corrected p < 0.044 both comparisons; figure 3), consistent with this inversion being polymorphic within D. persimilis and the inferred exchange between the two arrangements within that species .
Our data appear consistent with a model that treats the origin and fixation of an inversion as two separate steps. Theoretically, the inversions may have persisted as low-frequency variants within the nascent D. persimilis lineage for a long period of time (accounting for the deep coalescence) and more recently the inversions were driven to fixation (accounting for the reduced diversity within D. persimilis). Theoretical underpinnings of such a mechanism were recently presented (‘mixed mode geographic model’)  and rely on periods of allopatry (for an inversion to arise in a population) and secondary contact (to drive the inversion to fixation). This theory appears to describe our system well. The inversions on the XL and chromosome 2 may have arisen in allopatric lineages leading to D. persimilis. More recently, allopatric populations of D. persimilis and D. pseudoobscura came into contact and hybridized, favouring the spread of the low-frequency D. persimilis inversions owing to their recombination-reducing effect. Other predictions of the mixed mode model are also consistent with our data. For example, when populations re-establish secondary contact and population sizes differ, inversions will more often elevate in frequency in the smaller of the two populations  (in this case D. persimilis).
4. Future directions
First, while we have qualitatively described patterns of diversity and divergence with respect to inversions, formal model testing would better elucidate the processes responsible for these patterns. With formal model testing, we could evaluate a range of plausible parameter values for migration, gene flux, and selection to develop a more quantitative understanding of the forces influencing the D. pseudoobscura–D. persimilis inversions. Recent theoretical works [14,40,50] offer promising steps to enable such parametrization, and such analysis may be plausible in the future.
Second, inversions may have a greater role in shaping genomic diversity than the effects explored in our current study. When present in a heterozygous state, inversions stall the pachytene checkpoint and likely allow more time during meiosis for double strand breaks to be resolved as crossovers; thus, heterozygous inversions increase crossing over across the genome (‘interchromosomal effect’) [51–54] and this effect has been observed in the D. pseudoobscura system in particular [6,55]. Broadly, crossing over is positively associated with genomic diversity [56,57]; thus, inversions fixed between parapatric species may inadvertently increase genomic diversity in collinear regions when these species hybridize through the interchromosomal effect. Investigating this impact of inversions in creating and maintaining genomic diversity through the interchromosomal effects may reveal in part how hybrid zones contribute to within species diversity and the evolution of new adaptations.
In conclusion, while inversions play an important role in reproductive isolation in the D. pseudoobscura–D. persimilis system, they are just one avenue in which accentuated divergence may accumulate between the genomes of diverging, but occasionally hybridizing taxa [1,40]. Reduced recombination, in general, can contribute to divergence during the speciation process (reviewed in Nachman & Payseur ). Further, genomic scans and experimental approaches have identified regions with greater differentiation than expected under neutrality that are clustered but not associated with inversions (reviewed in earlier studies [1,59,60]). We look forward to a more comprehensive understanding of how often and when chromosomal rearrangements and these other processes facilitate the speciation process.
We thank L. Stevison, R. Guerrero, M. Kirkpatrick and three additional anonymous reviewers for critical comments on the manuscript. C. Jones provided useful discussion, and P. Nosil and J. Feder provided valuable insights. Funding for this research was provided by NIH grants GM076051, GM086445 and GM092501.
One contribution of 13 to a Theme Issue ‘Patterns and processes of genomic divergence during speciation’.
- This journal is © 2011 The Royal Society