In contrast to the large amount of ecological information supporting the role of natural selection as a main cause of population divergence and speciation, an understanding of the genomic basis underlying those processes is in its infancy. In this paper, we review the main findings of a long-term research programme that we have been conducting on the ecological genomics of sympatric forms of whitefish (Coregonus spp.) engaged in the process of speciation. We present this system as an example of how applying a combination of approaches under the conceptual framework of the theory of adaptive radiation has yielded substantial insight into evolutionary processes in a non-model species. We also discuss how the joint use of recent biotechnological developments will provide a powerful means to address issues raised by observations made to date. Namely, we present data illustrating the potential offered by combining next generation sequencing technologies with other genomic approaches to reveal the genomic bases of adaptive divergence and reproductive isolation. Given increasing access to these new genomic tools, we argue that non-model species studied in their ecological context such as whitefish will play an increasingly important role in generalizing knowledge of speciation.
One hundred and fifty years after the publication of The Origin of Species (Darwin 1859), elucidating the processes of population divergence and species diversification remains a central tenet of evolutionary biology. It is also increasingly recognized that understanding the evolutionary mechanisms of adaptive divergence is crucial to predicting how organisms will respond to global crises such as climate change, habitat loss, resistance to antibiotics or the spread of invasive species, to name a few (Bernatchez & Tseng 2008). As for any scientific discipline, progress in this field will be best achieved if studies are embedded within a strong, predictive, theoretical framework. The theory of adaptive radiation relates directly to Darwin's (1859) vision on the role of natural selection and adaptation driven by ecological processes in the course of diversification: it holds that both phenotypic divergence and ultimately speciation are the outcomes of divergent natural selection stemming from environment heterogeneity and competitive interactions. In revising this theory in the light of studies that have been conducted since its formulation, Schluter (2000) showed that it remains one of the most successful conceptual frameworks ever advanced in evolutionary biology. Studies that have accumulated since 2000 also convincingly demonstrated that knowledge on the ecology of adaptive radiation has made substantial leaps and allowed identification of the most pressing research areas.
In contrast to the large amount of ecological information supporting the role of divergent natural selection as a main cause of population divergence and speciation, a thorough understanding of the genomic basis underlying these evolutionary processes remains in its infancy (Mitchell-Olds et al. 2007; Schluter 2009; Via 2009) being limited to traditional model species for which knowledge of their ecology is generally very limited (e.g. Drosophila species, see Noor & Feder 2006). Over the last few years, however, identifying genes and QTL (quantitative trait loci) implicated in adaptive divergence and reproductive isolation has become one of the most active research areas in evolutionary biology. Remarkable progress has been accomplished in an increasing number of organisms, both in plants (e.g. Schemske & Bierzychudek 2007; Kane et al. 2009) and animals (e.g. Hoekstra et al. 2006; Joron et al. 2006; Storz et al. 2007; Barrett et al. 2008; other papers in this issue). Yet, identification of causative genetic polymorphisms underlying phenotypic divergence and elucidating the effect of natural selection on the molecular genetic architecture of adaptive traits and reproductive isolation remains a daunting task, particularly in organisms lacking extensive sequence information and other genomic resources.
Combining different research approaches and targeting various functional and biological levels (e.g. variation at the DNA, gene expression and phenotypic levels) represents the best strategy for deciphering the genetic basis of evolutionary change and diversification driven by natural selection (Vasemagi & Primmer 2005; Stinchcombe & Hoekstra 2008). Moreover, progress towards this goal will be best accomplished by the comparative study of evolutionarily young and ecologically distinct lineages. Thus, genetic changes contributing to early steps of adaptive divergence and reproductive isolation can be studied before they become confounded by additional genetic differences between species that accumulate after speciation is complete and which erase traces of genomic islands of speciation (Schluter 2000; Via 2009).
In this paper, we review the main findings of a long-term research programme that we have been conducting on the ecological genomics of sympatric species pairs of whitefish (Coregonus spp.). We present this system as an example of how applying a combination of multiple approaches under the conceptual framework of the theory of adaptive radiation has yielded substantial insight into the genetic basis and evolutionary processes of divergence and reproductive isolation in a non-model species for which limited genomic resources are available. To this end, we adopted a strategy that consists of four main steps:
—Identifying phenotypic traits most likely to be adaptive.
—Elucidating the genetic bases of these phenotypes.
—Finding evidence for natural selection at the genome level in the wild.
—Identify mechanisms of reproductive isolation and elucidating their molecular basis.
2. Whitefish biological attributes and background on evolutionary history and ecology
Many biological attributes make the genus Coregonus highly relevant for studying the genomic basis of adaptive divergence and reproductive isolation in an ecological context. It is the most speciose genus within the family Salmonidae, with a conservative estimate of 28 recognized taxa distributed throughout the Northern Hemisphere (Reshetnikov 1988) and the actual number of biological species could vastly exceed this figure (Kottelat & Freyhof 2007). Since the Pleistocene, members of this genus have evolved to occupy a wide array of habitats which lead to substantial phenotypic variation both among taxonomically recognized species and among populations within taxa. Of particular relevance is the occurrence both in North America (taxon C. clupeaformis) and Eurasia (C. lavaretus) of lacustrine forms of whitefish that live in sympatry. Phylogeographic studies confirmed the young age of these sympatric forms that evolved postglacially, less than 15 000 yr BP (Pigeon et al. 1997; Østbye et al. 2006) and population genetics surveys showed that they remain reproductively isolated despite ongoing gene flow and hybridization (Lu & Bernatchez 1999; Lu et al. 2001; Østbye et al. 2005a). Moreover, the life history and ecology of sympatric whitefish populations have been well documented (e.g. Lu & Bernatchez 1999; Amundsen et al. 2004; Harrod & Kahilainen 2006; Hudson et al. 2007; Kahilainen et al. 2007) such that the genomics of phenotypic divergence and reproductive isolation can be interpreted in a natural, ecological context (reviewed in Bernatchez 2004; Østbye et al. 2005b; Landry et al. 2007a; Vonlanthen et al. 2009).
The natural setting that we most extensively investigated is located in the St. John River drainage (northeastern US and southeast Quebec) where sympatric forms of whitefish have been reported in six lakes. Geographical isolation during the Pleistocene caused genetic divergence between whitefish populations inhabiting distinct glacial refugia east and west of this area, but without distinctive phenotypic divergence between glacial races in allopatry (Bernatchez & Dodson 1990, 1991). Secondary contact of these evolutionary lineages subsequently occurred 12 000 yr BP within these lakes. Both ecological opportunity and competitive interactions translating into character displacement have contributed to the evolution of a limnetic dwarf form, which has diverged in sympatry from the benthic normal form (Bernatchez 2004; Landry et al. 2007a). The weight of evidence thus indicates that some accumulation of genetic differences during allopatric geographical isolation as well as subsequent ecological divergence in sympatry have led to reproductive isolation between dwarf and normal whitefish (Lu & Bernatchez 1998; Rogers & Bernatchez 2006; Renaut et al. 2009).
3. Identifying adaptive phenotypic traits
Dwarf and normal sympatric whitefish use limnetic and epibenthic habitat, respectively (Bernatchez et al. 1999; Landry 2009), and exhibit phenotype-environment associations with their respective niches. The most discriminating morphological trait between them is the gill-raker apparatus whereby dwarf whitefish typically have more numerous and less separate gill-rakers than normal whitefish, resulting in a more efficient retention of smaller, planktonic prey (reviewed in Bernatchez 2004; Kahilainen et al. 2007). Dwarf whitefish are further characterized by a smaller size at maturity, slower growth rate, younger age at maturation and reduced fecundity relative to normal whitefish (Fenderson 1964; Rogers & Bernatchez 2005). In some lakes at least, limnetic (dwarf) fish tend to also suffer more predation than normal whitefish (Kahilainen & Lehtonen 2002). Normal and dwarf whitefish differ in heritable swimming behaviour: dwarf whitefish typically swim more actively and prefer higher positioning in the water column in experimental laboratory conditions (Rogers et al. 2002). Dwarf whitefish also have a higher metabolic rate, partly associated with the cost of more active swimming behaviour, and lower bioenergetic conversion efficiency (growth rate/consumption rate ratio; Trudel et al. 2001). Further evidence that natural selection has played an important role in the phenotypic divergence between dwarf and normal whitefish was provided by means of a Fst–Qst analysis (Spitze 1993; figure 1). By comparing the extent of divergence of numerous phenotypic traits (Qst) with that measured at neutral microsatellite markers (Fst) under both natural and experimental conditions, it was found that gill rakers, growth and swimming behaviour were the most likely traits to have evolved under the effect of divergent selection and therefore were identified as the most relevant phenotypic targets for further investigation (Rogers et al. 2002).
(a) Identifying adaptive phenotypic traits: insights from the transcriptome
Ever since the pioneering work of Britten & Davidson (1971) and King & Wilson (1975) on the mechanisms and evolutionary role of regulatory processes, there has been increasing recognition that variation in levels of gene expression represents a major source of evolutionary novelty, which in turn can lead to phenotypic divergence by natural selection (Oleksiak et al. 2002; Wittkopp 2007; Wray 2007; Fay & Wittkopp 2008). The development of microarray technologies, allowing the simultaneous detection of expression modulations at thousands of genes, offers a powerful means of assessing the importance of evolutionary change in gene regulation involved in adaptive population divergence (Bochdanovits et al. 2003; Ranz & Machado 2005; Matzkin et al. 2006). Moreover, microarrays represent an efficient tool for exploring the molecular mechanisms of life history trade-offs (Roff 2007).
Over the last few years, we have intensively used gene expression studies to investigate the molecular basis of adaptive divergence between dwarf and normal whitefish. Some of the specific questions addressed were: (i) is parallel phenotypic evolution observed between independently evolved dwarf and normal whitefish pairs accompanied by parallelism in expression of the genes potentially underlying phenotypic divergence? (Derome & Bernatchez 2006; Derome et al. 2006; Whiteley et al. 2008; Jeukens et al. 2009), (ii) what are the molecular mechanisms underlying life-history trade-offs (including sexual conflicts) potentially involved in the adaptive divergence of dwarf and normal whitefish? (Derome et al. 2008; St-Cyr et al. 2008), and (iii) what proportion of the genome is differentially regulated and expressed at different life history stages (Nolte et al. 2009; Renaut et al. 2009).
These microarray studies were conducted both in controlled (laboratory) environments and two natural lakes, involved three life stages (embryos, juveniles and adults), and three tissues (white muscle, liver and brain). Given the non-availability of a specific whitefish microarray, total RNA extracts reverse-transcribed to cDNA were probed on either 3557 and later 16 006 features cDNA microarrays developed for the Atlantic salmon (Salmo salar) by the cGRASP (consortium for Genomic Research on All Salmon Project; Rise et al. 2004; von Schalburg et al. 2005). Studies have shown that this microarray platform performs well for closely related salmonid species, including lake whitefish (Rise et al. 2007) and that hybridization levels were comparable in both normal and dwarf whitefish (Derome et al. 2006).
Table 1 summarizes the general patterns of differentially expressed genes across three tissues at the adult stage. It first shows that several hundred genes were differentially expressed between adult dwarf and normal whitefish. Secondly, the proportion of differentially expressed genes out of the total number that were expressed was similar across tissues, varying from 11.1 per cent (brain) to 13.8 per cent (white muscle) and up to 15.9 per cent (liver). Out of those differentially expressed genes, a parallel pattern in level of gene expression between dwarf and normal whitefish from two natural lakes was observed for both white muscle and liver, with a higher proportion of parallel genes observed for the latter (2.39% versus 1.35%). This is of key interest, because parallel evolution among independent, closely related lineages is a predictable consequence of natural selection (Harvey & Pagel 1991; Schluter 2000). In addition to natural lakes, experiments conducted on fish (liver tissue) kept in controlled conditions revealed that 34 out of the 92 parallel genes observed in natural lakes showed the same differential patterns of gene expression. A heat-map presented in figure 2 shows how the differential pattern of expression at parallel genes unambiguously clusters dwarf and normal whitefish separately. Interestingly, the ratio of inter-individual variance of expression for genes showing parallelism of expression over the variance for genes with non-parallel pattern of expression showed that variance of expression was much smaller for the former than the latter in both muscle and liver (table 1). Moreover, for parallel genes observed in muscle and liver, the proportion of upregulated genes was higher in dwarf compared with normal whitefish, in accordance with the general observation of a more active metabolism in dwarf whitefish.
The hundreds of genes that showed differential patterns of transcription between dwarf and normal whitefish across the three tissues were classified into at least 30 different functional groups using information from various sources (e.g. cGRASP website, SwissProt/TrEMBL database, NCBI GO browser, KEGG Pathway database, SOURCE Database, EMBL Bioinformatic harvester, completed with references from the literature; see the electronic supplementary material, table S1). Of particular interest are those functional groups that were over-represented in terms of number of parallel genes showing differences between dwarf and normal whitefish relative to the total number of genes that were expressed in both forms for each functional group. Dwarf whitefish consistently showed significant over-expression of genes potentially associated with survival through enhanced activity (energy metabolism, muscle contraction, homeostasis, lipid metabolism and detoxification) whereas genes associated with growth (protein synthesis, cell cycle and cell growth) were generally upregulated in normal relative to dwarf whitefish (figure 2).
Gene expression was also screened using the 16 006 cDNA microarray to identify regulatory changes at embryonic and juvenile stages and compare them with previous results at the adult stage (table 2). This revealed that 16-week-old juvenile fish have 14 times more genes displaying significant regulatory divergence than embryos, for which very few genes showed significant differences in expression between dwarf and normal whitefish. Moreover, regulatory changes in 16-week-old juvenile fish matched patterns in adult fish, which suggests that gene expression divergence is established early in juvenile fish and persists throughout the adult phase. For example, we found that at least 26 genes in juvenile fish were also candidate genes involved in adaptive divergence in adult fish. Finally, real-time polymerase chain reaction was used to document transcriptional divergence for specific candidate genes identified from the above microarray experiments among North American (C. clupeaformis) and European (C. lavaretus) populations (Jeukens et al. 2009). Parallelism in expression of candidate genes was observed across whitefish limnetic and benthic pairs, including opsin genes (vision function) that were not screened by microarrays, which provided indirect strong support for the hypothesis that divergent natural selection has been acting similarly on the same genes in the adaptive radiation of whitefish both in North American and European lakes.
4. Elucidating the genetic bases of phenotypes
(a) Insights from pQTL (phenotype QTL) mapping
Mapping genetic regions underlying the expression of phenotypes represent an essential step towards identifying the genomic locations involved in adaptive divergence. The role of these genomic locations as potential barriers to gene flow can then be assessed (Rieseberg et al. 1999). This approach involves documenting the genetic architecture, i.e. the number, location and effect of genomic locations contributing to differentiation within and among populations or species (Rieseberg 1998; Orr & Turelli 2001). Such a genome-wide perspective is also essential for a complete understanding of the functional genomic response that occurs as evolutionary processes influence populations as they diverge (Wu & Ting 2004). In order to elucidate the genetic bases of phenotypic divergence between dwarf and normal whitefish, we have used linkage mapping to document the number and effects of QTL involved in controlling adaptive traits differing between them. We first assembled genetic linkage maps for two backcross (F1× limnetic, F1×benthic) hybrid families wherein over 900 AFLP and microsatellite loci were genotyped and positioned in 336 progeny (Rogers et al. 2007). The homology of mapped loci between families resolved 34 linkage groups that exhibited 83 per cent colinearity among linked loci between these two families. These maps represented the basis for assessing the genetic architecture of eight adaptive traits in the two hybrid backcross families where the phenotypes of individual offspring were determined throughout their life history (Rogers & Bernatchez 2005, 2007). A total of 34 pQTL linked to these traits and distributed over 13 linkage groups were identified using interval mapping (figure 3).
(b) Insights from eQTL (expression QTL) mapping
The analysis of the genetic architecture of transcriptome variation is increasingly considered as a powerful way to further our understanding of the mechanistic basis of adaptive divergence (Roff 2007; Mackay et al. 2009). QTL analyses of gene expression traits (eQTL) treat transcript abundance as a quantitative trait and apply traditional QTL mapping techniques to localize genetic determinants of gene expression (Gibson & Weir 2005; Gilad et al. 2008). In addition, it is possible to compare genomic eQTL locations with pQTL to locate candidate genes underlying phenotypic traits (Wentzell et al. 2007). This ‘phenomics’ approach provides a solid framework for examining transcriptional underpinnings of phenotypic divergence.
We used microarray analyses to localize the genetic determinants of gene expression for progeny of a backcross whitefish family previously used to construct the linkage (and pQTL) map (Derome et al. 2008; Whiteley et al. 2008). The analyses were performed for two tissues on a subset of 66 (white muscle tissue) and 57 (brain tissue) of the same individual progeny used to build the genetic and pQTL map. We attempted to elucidate the genomic distribution of muscle and brain eQTL and to determine the extent to which genes controlling transcriptional variation may underlie adaptive divergence in dwarf and normal whitefish. Overall, we identified hundreds of significant eQTL in both brain and muscle that were distributed over numerous linkage groups with several overlapping with pQTL (figure 3).
More specifically, results for white muscle revealed 261 eQTL involving 138 unigenes (for a mean of about 1.9 eQTL per gene) distributed over 24 linkage groups and consisting of 15 eQTL that co-localized with pQTL. Results obtained for brain revealed a similar number of eQTL (249) but involving many more unigenes than for muscle (237 for a mean of 1.05 per gene) distributed over 28 linkage groups with up to 50 eQTL that co-localized with pQTL. This allowed identification of candidate genes for species pair divergence involved in various functions, namely energetic metabolism, protein synthesis, cell growth, muscle contraction and neural development, on the basis of co-localization of eQTL for genes expressed in both tissues with previously identified adaptive pQTL. For instance, 88 per cent of eQTL-pQTL co-localization involved growth rate and condition factor QTL, two traits central to the adaptive divergence of whitefish species pairs (figure 3). We found several cases of co-localization between eQTL and pQTL that could potentially have a functional association, although this remains to be rigorously investigated. For example, an eQTL regulating the expression of an ATP synthase β chain gene in white muscle involved in energy metabolism overlapped with a pQTL underlying condition factor on linkage group 32. In muscle also, an eQTL for troponin which plays a role in fast muscle contraction overlapped with a pQTL underlying burst swimming on linkage group 33.
The most prominent feature of both brain and muscle eQTL maps was the non-random distribution of eQTL, which translated into a prevalence of ‘eQTL hotspots’ characterized by the co-localization on the same chromosomal region of many eQTL associated with genes of very distinct functions. This is of prime interest to decipher the genetic architecture of adaptive divergence since regions with a high number of distal eQTL might harbour master regulators genes (e.g. transcription factors; Gilad et al. 2008). Defining a hotspot as the co-localization of four eQTL or more, 47 per cent of all 249 eQTL identified in brain were associated with 12 hotspots distributed over eight linkage groups (Whiteley et al. 2008). These hotspots contained nine eQTL on average, but up to 32 eQTL on the same hotspot were observed. Interestingly, 95 per cent of all eQTL grouping at a given hotspot showed the same directionality in additive effect. An even more pronounced non-random pattern of eQTL distribution was observed in white muscle with 41 per cent of all eQTL grouping into six hotspots distributed across four linkage groups (Derome et al. 2008). Each hotspot contained an average of 18 eQTL that co-localized to a same position. As for the brain results, the majority (80%) of all eQTL grouping at a given hotspot showed the same directionality in additive effect.
The most extreme case of non-random eQTL distribution was observed for muscle on linkage group 25, where two tightly linked hotspots, respectively, contained 34 and 53 eQTL. Each of these contained eQTL regulating genes with numerous functions. The most represented functions were protein synthesis, cell growth, energy metabolism and muscle contraction. Transcripts involved in protein synthesis and cell growth exhibited systematically opposed additive effects to those associated with energy metabolism and muscle contraction (Derome et al. 2008), which is consistent with the observed trade-off in life-history traits and differential gene expression distinguishing dwarf and normal whitefish (St-Cyr et al. 2008). Directional predominance of additive effects provides strong evidence for the role of directional selection in shaping genome architecture (Orr 1998). Therefore, these results suggest that directional selection has been acting on patterns of gene regulation associated with these two tightly linked chromosomal regions. Moreover, such a concentration of eQTL belonging to diverse functional groups suggests that single master regulators located at these positions could control their expression. Indeed, these hotspots contained eQTL for genes known to play fundamental roles in transcription regulation (e.g. DPY-30-like protein gene, Histone H1 homologue protamine, small ubiquitin-like modifier (SUMO)). Putative master regulatory genes were also observed at other eQTL hotspots, such as zinc-finger proteins, both for brain and muscle. These could represent mutations related to gene regulation and be interpreted as a form of early substitution of large effect theoretically predicted during adaptive divergence (Orr 2005), and hypothetically play a central role in the divergence of dwarf and normal whitefish.
Another prominent feature of both eQTL maps was a pronounced sex bias in transcriptional genetic architecture observed for both brain and white muscle, with more eQTL observed in males in the case of brain and vice versa for white muscle (Derome et al. 2008; Whiteley et al. 2008). In addition, we detected differentially expressed transcripts associated with eQTL segregating in sex-specific datasets and mostly belonging to functional groups that differentiate dwarf and normal whitefish in natural populations.
5. Finding evidence for natural selection at the genome level in the wild
Understanding the response of genetic variation to natural selection is required to demonstrate that the mechanism causing divergence between populations or species at the genome level is adaptive. To this end, genome scans have been increasingly used to identify genomic regions resisting the homogenizing influence of gene flow and therefore likely to be under divergent selection (Nosil et al. 2009a,b). This approach relies on the principle that following hybridization, recombinant genotypes with maladaptive combinations of loci will be removed, resulting in a differential pattern of introgression across the genome. Loci (or linked loci) incurring reduced fitness in recombinant genotypes will be characterized by a greater genetic differentiation relative to neutral or selectively advantageous loci (Barton 2001). Therefore, identifying loci linked to genes implicated in differential adaptation and/or reproductive isolation can be achieved by quantifying patterns of genetic differentiation between populations that are still exchanging genes (Beaumont & Nichols 1996; Beaumont & Balding 2004; Foll & Gaggiotti 2008). This requires a ‘population genomics’ approach wherein a large number of loci (hundreds or more) are genotyped to accurately estimate the expected level of genetic differentiation under neutrality, and the proportion of loci linked to genes implicated in adaptation and reproductive isolation. Genome scans can also reveal parallel trends of divergence through the analysis of multiple populations, consequently offering stronger support for the role of natural selection in adaptive trait evolution. When applied to populations in an early stage of divergence, genome scans reveal the mosaic nature of the genome whereby markers in divergently selected genomic regions will reflect the evolutionary history of adaptive divergence and ecologically based reproductive isolation (Via & West 2008).
We performed a genome scan by which the pattern of genetic differentiation obtained using 440 AFLP loci was compared with that expected under neutrality estimated by simulations in four sympatric pairs of dwarf and normal whitefish (Campbell & Bernatchez 2004). A total of 48 loci with an average of 14 loci per lake showed restricted gene flow relative to neutral expectation, suggesting a role of directional selection on their divergence. Further insight into the identity of these ‘anonymous’ AFLP markers was gained by combining information obtained from population genomics with that of pQTL and eQTL mapping (Rogers & Bernatchez 2007; Derome et al. 2008; Whiteley et al. 2008). Thus, homology between markers used in mapping and population genomics was found for 180 markers and for 24 of the 48 outlier loci. As illustrated in figure 4, 19 of these 24 outliers corresponded to either pQTL (n = 9), eQTL (10 positions involving eQTL for 21 genes), or both (six positions involving 22 genes).
For pQTL, loci exhibiting a signature of selection were associated with QTL relative to other regions of the genome more often than expected by chance alone (Rogers & Bernatchez 2007). Among the 15 such pQTL, eight were associated to growth and two of these showed patterns of parallelism among independent lakes. Figure 5 illustrates the case of pQTL CATA104 associated with growth on linkage group 4 and showing highly reduced gene flow between dwarf and normal whitefish in three different lakes. Moreover, these two parallel QTL outliers exhibited segregation distortion in mapping families, supporting the hypothesis that adaptive divergence contributing to parallel reductions of gene flow among natural populations may also cause genetic incompatibilities (see below, Rogers & Bernatchez 2007).
Table 3 presents the list of the 10 AFLP outliers distributed over 10 different linkage groups that were also associated with either white muscle or brain eQTL, among which six also co-localized with pQTL. These represent a total of 43 genes involved in numerous functional groups, protein synthesis and energy metabolism being the most represented. Most genes falling into these two functional groups were also previously identified as candidates in the adaptive divergence between dwarf and normal whitefish on the basis of parallelism in transcription. One example of an eQTL outlier is CATA073; located on linkage group 6. This locus showed highly reduced gene flow between dwarf and normal whitefish in three lakes, is linked with an eQTL for cytochrome c oxidase subunit VI, a gene involved in the energy-producing oxidative phosphorylation (Whiteley et al. 2008), and is a possible candidate for divergence hitchhiking (sensu Via 2009) with a pQTL for condition factor (Rogers & Bernatchez 2007; figure 5). Evidence for parallelism in transcription or significant difference in expression in at least one dwarf/normal pair was also found for the majority of genes falling in other functional groups (table 3). It is also noteworthy that three out of 10 AFLP outliers also corresponded to eQTL hotspots, all of them containing a putative master regulatory gene.
6. Mechanisms of reproductive isolation and their genetic basis
The ecological theory of adaptive radiation predicts that reproductive isolation evolves as a by-product of divergent natural selection, which shares a common genetic basis with adaptation through pleiotropic interactions (Schluter 2000). However, the underlying genetic basis of reduced hybrid fitness (including inviability) that causes reproductive isolation in cases of adaptive divergence has remained elusive for most species (Coyne & Orr 2004). By examining the genetic basis of post-zygotic isolation during the early stages of population divergence, it is possible to determine the factors accounting for a reduction in hybrid fitness and to gain insight into the evolutionary forces that lead to the formation of these barriers. This includes concurrently investigating intrinsic and extrinsic factors influencing reproductive isolation (Rice & Hostert 1993). Intrinsic post-zygotic isolation is independent of the environment and arises from genomic incompatibilities caused by divergent developmental systems. On the other hand, extrinsic post-zygotic isolation is environmentally dependent and associated with lower hybrid fitness in specific environments. Investigating the genetic basis of intrinsic and extrinsic factors was particularly relevant in dwarf and normal whitefish since both the accumulation of genetic differences in allopatry and ecological divergence in sympatry following secondary contact may have contributed to their reproductive isolation. This was achieved through laboratory experiments and linkage mapping.
(a) Intrinsic post-zygotic isolation
Evidence for intrinsic post-zygotic isolation between dwarf and normal whitefish has come from two sources. However, observations were obtained from two mapping families only and should be interpreted accordingly. First, reduced sperm performance observed in backcross individuals relative to parental and F1 hybrids was consistent with genomic incompatibilities that create a range of negative fitness effects in post-F1 hybrids (Whiteley et al. 2009). Second, we observed a pronounced pattern of segregation distortion in the two backcross mapping families for nearly one third of mapped markers across 28 linkage groups (Rogers et al. 2007). Moreover, there was a significant correlation between the percentage and direction of segregation distortion for 64 homologous loci between both maps. During embryonic development, such pronounced locus-specific deviation from Mendelian segregation is likely to reflect genomic incompatibilities associated with intrinsic hybrid inviability (Vogl & Xu 2000). In order to test this hypothesis, we analysed patterns of Mendelian segregation at loci with known linkage associations among backcross embryos sampled at critical periods of development against larvae within the same family that successfully survived to emergence (Rogers & Bernatchez 2006). Our data provided evidence for the role of embryonic mortality as a strong intrinsic reproductive isolation barrier (figure 6). At its peak mortality, the mortality rate of backcross hybrids was 2.5 to 3.7 times greater than that observed previously in F1 hybrids (Lu & Bernatchez 1998) and 5.3 to 6.5 times greater than observed in pure crosses. When comparing embryos that died at different times against larvae that successfully hatched, several loci distributed along different linkage groups exhibited temporal deviation in Mendelian segregation ratios, which ultimately lead to significant segregation distortion in the surviving progeny. On linkage group 3 in particular, a positive relationship between four tightly linked loci suggested that increased embryonic mortality during development was a function of incompatibilities on this segment of the genome between dwarf and normal genomes (figure 6).
Very interestingly, combining information from the white muscle eQTL map with that from linkage mapping revealed that one of these severely distorted markers (CAAG143.7) overlapped with an eQTL hotspot on Lg3. This hotspot contained 11 genes involved in many biological functions, the most important being energy metabolism (cytochrome c oxidase subunit VI, creatine kinase, ATP synthase, ubiquinol-cytochrome c reductase, NADH dehydrogenase, selenoprotein P), but also protein synthesis (40S ribosomal protein S20), muscle contraction (parvalbumin, actin), and reproduction (cathepsin L precursor; figure 6). It is also noteworthy that parallelism in patterns of gene expression was observed for the majority of those genes in microarray studies (Derome et al. 2006; St-Cyr et al. 2008) and that several loci linked to Lg3 had previously been found to be significantly resistant to introgression between whitefish from distinct glacial races (Rogers et al. 2001).
(b) Evidence for genetically determined extrinsic post-zygotic isolation
Extrinsic, ecologically based, post-zygotic isolation was evidenced by a comparative analysis of the timing and synchronicity of larval emergence between pure and hybrid crosses, a trait that is critical to the survival of fishes in nature (Cushing 1990). We also tested for QTL associated with time to emergence to determine whether extrinsic post-zygotic isolation barriers may contribute to hybrid inviability (Rogers & Bernatchez 2006). Under controlled environmental conditions, the dwarf and normal experimental families exhibited a synchronicity similar to what was observed in the natural populations (Chouinard & Bernatchez 1998). In the F1 hybrids, mean hatching time was intermediate to the dwarf and normal crosses, offering evidence that developmental time to emergence was under additive genetic control. In contrast, the time to emergence in backcrosses was highly asynchronous: time to emergence was delayed approximately 11 days while the total hatching span was 30 days longer than observed in pure crosses. Moreover, hybrid backcrosses exhibited increased variance with respect to time to emergence, which was 10 times higher when compared with natural dwarf and normal populations. In the linkage map, interval analyses provided a genetic basis for this trait as it revealed one significant and suggested another QTL associated with time to emergence, which, respectively, explained 29 and 27 per cent of the phenotypic variance. Additionally, the patterns of hatching in backcrosses were indicative of transgressive segregation. That is, variation in time to emergence in these hybrids exceeded the combined variation of both parental populations.
(c) Molecular basis for reproductive isolation: further insights from the transcriptome
As exemplified above, both hybrid breakdown and transgressive segregation may explain the underlying basis of post-zygotic reproductive isolation (either intrinsic or extrinsic) between dwarf and normal whitefish. The manifestation of such extreme traits supposes non-additive, epistatic, gene interactions (Coyne & Orr 2004; but see also Rieseberg et al. 1999) causing abnormal gene regulation when mixing divergent regulatory elements into a common genetic background. For instance, Landry et al. (2007b) showed how the regulation of coevolved cis regulatory regions and trans transcription factors could be disrupted and lead to phenotypic novelties in hybrids. Genome-wide analyses of the transcriptome confirmed that gene misexpression may underlie reproductive isolation mechanisms in F1 hybrids between species that diverged many hundreds of thousands or millions of years ago (e.g. Ranz et al. 2004; Landry et al. 2007b). However, patterns of gene expression in young species pairs such as whitefish, as well as post-F1 hybrid generations have been little explored, such that the underlying transcriptomic basis of reproductive isolation mechanisms remains largely unknown.
In recent studies using the 16 006 features salmon cDNA microarray, Renaut et al. (2009) contrasted gene expression divergence between pure normal and dwarf whitefish with that of first generation hybrids and second-generation backcrosses at the same embryonic and juvenile stages investigated by Nolte et al. (2009). The goal was to identify genes misexpressed in hybrids that could potentially link to abnormal phenotypic variation observed during embryonic development in previous studies. We found that very few transcripts (five of 4950 expressed or 0.1%) differed in mean expression level between pure forms and hybrids at the embryonic stage, in contrast to 16-week-old juvenile fish for which 617 out of 5359 transcripts (11%) differed significantly. Of particular interest were six key metabolic genes (glyceraldehyde-3-phosphate dehydrogenase, fructose–bisphosphate aldolase A, betaenolase, trypsin-1 precursor, cytochrome c oxidase polypeptide VIa and nucleoside diphosphate kinase) that were divergent between normal and dwarf whitefish at both juvenile and adult stages (Nolte et al. 2009). More specifically, we found that, in F1-hybrid juveniles, those genes mostly showed an intermediate pattern of expression compared with parents. An intermediate level of gene expression for those metabolic genes in hybrids could contribute to an atypical physiological phenotype and to an inferior, ecologically maladapted, individual. As such, reproductive isolation of lake whitefish could be seen as a by-product of divergent selection acting on metabolic genes, in accordance with previous studies showing that environmentally driven natural selection may be key in explaining incipient population divergence (e.g. Gow et al. 2007; van der Sluijs et al. 2008).
As illustrated in figure 7, we also found clear evidence for severe gene misexpression whereby non-additive gene interactions explained a large fraction of hybrid inheritance patterns in backcross (54%) compared with F1-hybrids (9%). Inter-individual variance in the level of gene expression was also approximately twice as important in hybrids relative to pure crosses. This translated into a pronounced transgressive segregation of transcription variation, especially in backcross hybrids for which the expression of 2622 (embryos) and 2316 (juveniles) genes exhibited exaggerated variance, where the range of backcross expression extended outside the range of both parents values. Such a high number of transgressive genes significantly exceeded that expected by chance alone (Renaut et al. 2009). In particular, five of the nine most transgressive transcripts closely matched with three different homologues (Immunoglobulin binding protein (protein folding); translation elongation factor alpha 1 (mRNA translation) and 40S ribosomal protein s11 (mRNA translation)) identified as essential for early embryonic development of zebrafish, Danio rerio (Amsterdam et al. 2004). Knockdown mutants in D. rerio for those genes show visible embryonic defect and almost invariably die prior to, or soon after hatching. Transgressive segregation may underlie post-zygotic isolation mechanisms given that transgressive hybrids often suffer a highly reduced survival (Barton 2001; Coyne & Orr 2004). Therefore, it is plausible that the overall patterns of transgressivity we observed, and particularly misexpression of several key developmental genes, may contribute to abnormal hybrid development and increased embryonic mortality identified as a significant intrinsic post-zygotic reproductive isolation mechanism between dwarf and normal whitefish (Lu & Bernatchez 1998; Rogers & Bernatchez 2006).
7. Summing up the evidence for the case of natural selection driving adaptive divergence and reproductive isolation in whitefish
Integrating a combination of multiple ecological and genomic approaches under the conceptual framework offered by the theory of adaptive radiation has proven highly efficient for highlighting the role of natural selection in driving the adaptive divergence and reproductive isolation between dwarf and normal whitefish, as has yield substantial insight into the genomic basis of these evolutionary processes.
First, combining information obtained both from a comparison of phenotypic and genome wide transcriptomic differentiation identified the phenotypic (including physiological) traits most likely to be adaptive and therefore of most pertinent focus for further investigation of the genetic basis of adaptive divergence. These data also provided strong support for the mechanistic hypothesis that the adaptive divergence and evolution of distinct life-history strategies between dwarf and normal whitefish involve differential trade-offs between growth (and correlated fecundity) and survival which, in turn, is mediated through the higher energetic cost of occupying the limnetic relative to the benthic trophic niche. Thus, higher metabolic rate, more active swimming associated with planktonic feeding and predator avoidance, as well as reduced bioenergetic conversion efficiency may constrain available energy for growth, fecundity and reproduction at older ages in dwarf whitefish. In addition, the analysis of transcriptomic divergence between dwarf and normal whitefish revealed that their differentiation involves at least several hundred genes (conservatively between 1 and 3 per cent of the genome), thus pointing out that it may be misleading to draw general conclusions about the genetic basis of adaptive divergence based on the investigation of single or few candidate genes. Moreover, parallelism in patterns of gene transcription, as well as patterns of inter-individual variance in expression, provided strong evidence for the role of natural selection in the evolution of differential regulation of genes involving a vast array of physiological processes. This provided a mechanistic, genomic basis for the observed trade-off in life-history traits distinguishing dwarf and normal whitefish. Finally, the high number of differentially expressed genes at the juvenile and adult stages may provide a broad genomic basis for extrinsic post-zygotic isolation (Nosil et al. 2009a,b).
Second, the combined, integrated use of linkage, phenotypic and gene expression mapping provided an efficient means to elucidate the genetic architecture underlying the most probable adaptive phenotypic traits differentiating dwarf and normal whitefish. It also revealed several associations between specific gene and phenotypic expression and identified key genomic regions potentially harbouring genes (master regulators) of high pleiotropic effects on gene expression. The prevalence of ubiquitous directionality in additive affects was consistent with the action of natural selection on single master regulators controlling expression of a suite of numerous genes, which belong to diverse functional groups involved in the adaptive divergence of whitefish and ultimately modulate the organismal phenotype. Finally, combining pQTL and eQTL mapping provided the identity of numerous candidate markers for which the influence of natural selection in wild populations can be methodically tested.
Third, integrating pQTL and eQTL mapping with population genomics of multiple pairs of dwarf and normal whitefish efficiently identified genomic regions resisting the homogenizing influence of gene flow and therefore likely to be under divergent selection. Thus, the AFLP genome scan indicated that a relatively small proportion of the genome (approximately 1–2%, but still representing hundreds of loci over the whole genome) might be linked to genes implicated in the adaptive divergence and reproductive isolation of dwarf and normal whitefish. The majority (19 out of 24 or 80%) of ‘anonymous’ AFLP outliers could be associated with either a pQTL, eQTL or both. This showed that outliers were mainly associated with either growth, swimming behaviour phenotypes or expression of genes associated with these functions (protein synthesis and energy metabolism). Most eQTL outliers also corroborated previously identified candidate genes on the basis of parallelism in transcription. These observations provided compelling evidence for the role of natural selection in restricting effective gene flow in regions modulating the expression of genes underlying contrasting whitefish life-history strategies that evolved for adapting to distinct environments and resources.
Fourth, combining experimental studies, pQTL and eQTL mapping with a comparative analysis of genome-wide transcription patterns identified several mechanisms of reproductive isolation and provided insight into their genetic basis. This first revealed that genetically based asynchronous emergence resulting from the admixture of distinct genomes in hybrids provides an underlying basis for extrinsic, ecologically based post-zygotic isolation. This phenotype would most certainly be selectively unfavourable in nature, since asynchronous emergence has been commonly correlated to increased risk of starvation, decreased opportunities for optimal growth and increased risk of predation (e.g. Cushing 1990). Furthermore, we demonstrated that highly reduced sperm performance and survival during embryonic development in backcross hybrids translated into intrinsic post-zygotic isolation. Incompatibility in specific genomic regions, perhaps initiated during divergence in allopatry and reinforced by subsequent sympatric ecological specialization of dwarf and normal populations most probably underlies this intrinsic post-zygotic isolation mechanism. These observations also raised the hypothesis that hybrid inviability is associated with misregulation of genes with essential functions. This hypothesis was confirmed by genome-wide analysis of the transcriptome which revealed abnormal patterns of expression at numerous genes in hybrids, including key genes involved in embryonic development as well as genes playing central roles in energy metabolism and involved in the adaptive divergence of dwarf and normal whitefish. Gene misexpression in hybrids thus adds to other intrinsic and extrinsic reproductive isolation factors between dwarf and normal whitefish.
Overall, evidence for reduced sperm performance, and genetically based increased embryonic mortality in hybrids followed by asynchronous emergence indicates that intrinsic and extrinsic mechanisms of reproductive isolation are not mutually exclusive; both processes are jointly involved in the ongoing speciation process in whitefish. Moreover, since we substantiated that geographical isolation as well as ecological divergence that subsequently occurred in sympatry have contributed to the origins of these forms, it indicates that both processes of ecological and non-ecological speciation (sensu Rundell & Price 2009), such as mutation-order speciation, defined as the evolution of reproductive isolation by the chance occurrence and fixation of different alleles (Schluter 2009), have been involved in the origin of dwarf and normal whitefish in eastern North America. However, this may not be generalized to all cases of sympatric divergence in Coregonus since there is no evidence for an initial phase of allopatry for Eurasian sympatric whitefish populations despite similar and in some cases, more pronounced patterns of phenotypic divergence observed among those (Douglas et al. 1999; Østbye et al. 2006).
8. Future directions: stepping into the next-generation sequencing era
The combined use of gene mapping, population genomics and transcriptomics allowed the identification of nearly 500 genes representing a vast array of physiological functions that are probably candidates for the adaptive divergence and reproductive isolation of dwarf and normal whitefish (see the electronic supplementary material, table S1). The precise role and implication of these genes still need to be confirmed in whitefish, which will be the main focus of our research programme in the following years. Indeed, the identification of candidate genes does not constitute an end in itself, but rather the beginning of a new set of evolutionary relevant questions (Stinchcombe & Hoekstra 2008). In the specific case of the whitefish research programme, some of the key questions or issues to be addressed in future studies are: What are the size, distribution and nature of genomic islands of speciation, that is chromosomal regions that remain highly differentiated between populations despite ongoing gene flow (Turner et al. 2005)? This issue is of particular interest since recent studies suggested that surprisingly large genomic regions around divergently selected loci can be protected from recombination by divergence hitchhiking (Via 2009). If proven general, this process could neutralize one of the longstanding criticism of sympatric isolation, that is; the difficulty of maintaining linkage disequilibrium between genes involved in ecological divergence and those causing assortative mating (Coyne & Orr 2004). In the same way that it is invoked in the process of ecological speciation between dwarf and normal whitefish, can natural selection be strong enough to maintain species cohesiveness by favouring the spread of alleles that are advantageous within a same form (dwarf or normal; Morjan & Rieseberg 2004)? Are adaptations to similar ecological niches that evolved independently owing to the same mutations (Linnen et al. 2009)? Do alleles conferring their specificity to dwarf and normal whitefish emerge from standing genetic variation or new mutations? What is the role of positive selection in maintaining functional divergence at the sequence level of those genes? What are the relative contributions of regulatory and structural (protein) changes to the speciation process of whitefish? What is the relative role of genomic mechanisms (e.g. cis- versus trans- acting regulation, allele-specific differential expression, differential alternative splicing, expression of different isoforms, epigenetic interactions) in the differential pattern of transcription we observed at hundreds of genes? Transcriptomics studies showed that energy metabolism is the main biological function involved in the divergence between dwarf and normal whitefish. Given the fundamental role of mitochondria as the main source of cellular energy production which involves networks of interactions between nuclear (slow evolving) and mitochondria (fast evolving) encoded genes, and since historical isolation led to the divergence of the mitochondrial genome of the two glacial races of whitefish (Bernatchez & Dodson 1990; Lu et al. 2001), can breakdown or mis-regulation of mitochondrial bioenergetics functions in hybrids be a major player in the speciation process of dwarf and normal whitefish, as revealed recently in other systems (Ellison & Burton 2008; Gershoni et al. 2009)? There is mounting evidence that selection has been acting more strongly on dwarf than normal whitefish (Bernatchez 2004; Derome et al. 2006; St-Cyr et al. 2008). Given the pronounced transcriptomic divergence observed between them, does natural selection favour canalization (defined as the evolved ability to produce a consistent expression of a same phenotype in different environments) more importantly in dwarf than normal whitefish (Landry 2009)? Similarly, is canalization of expression more pronounced for genes identified as candidates for the adaptive divergence of whitefish than other, ‘non-candidate’ genes?
Addressing those issues will necessitate the continued application of the same strategy we have used over the last years. That is, combining different research approaches and targeting various functional and biological levels, including the integration of the most recent technological developments. Namely, next generation sequencing technologies, which are rapidly revolutionizing evolutionary biology research will be of paramount interest (Rokas & Abbot 2009). These approaches have already showed their efficiency in various applications, including rapid SNP (single nucleotide polymorphism) discovery (Barbazuk et al. 2007), genome sequencing of ecologically important models (Vera et al. 2008) as well as accurate gene expression analysis (Torres et al. 2007; Lipson et al. 2009). High throughput pyrosequencing (454 Life Sciences, Margulies et al. 2005) is particularly relevant to the study of non-model organisms such as whitefish since it yields longer sequencing reads than any other method, allowing more accurate de novo sequence assemblies in the absence of reference genomes.
As an illustration of its usefulness to our research programme, we recently performed a 454 pyrosequencing analysis, which allowed assembling, de novo, over 130 mb of non-normalized cDNA using data from dwarf and normal whitefish (Renaut et al. 2010). Our main objectives with this first ‘next-gen’ sequencing application was to gather a large dataset of SNP markers to be used in further mapping and genome scan studies of candidate genes and to gain further insight into transcriptional divergence between dwarf and normal whitefish, which included quantifying allele-specific expression, something not possible with previous cDNA microarray analyses. We identified 6094 putative SNPs in 2674 contigs (mean size: 576 bp, range (101–6000 bp)), comprising 1540 synonymous and 1734 non-synonymous mutations for a genome-wide non-synonymous to synonymous substitution rate ratio (pN/pS) of 0.37. About 90 SNPs that showed highly significant divergence in allele frequencies between normal and dwarf whitefish were identified. Among those, SNPs in genes annotated to energy metabolic and protein synthesis functions were the most abundant. This corroborated previous evidence that genes involved in these two functions are prime candidates underlying the adaptive divergence of normal and dwarf whitefish.
One example of such candidate SNP was found in a Triosephosphate isomerase gene which plays an important role in glycolysis and is essential for efficient energy production energy metabolism. This gene was screened between dwarf and normal whitefish from Cliff Lake by means of MALDI-TOF mass spectrometry (Sequenom MassARRAY system, Ehrich et al. 2005; Nolte et al. 2009, unpublished results; figure 8). The Fst value between dwarf and normal for this gene was 0.685, making it an outlier in the distribution of Fst values observed in the AFLP genome scan (Campbell & Bernatchez 2004). At the same time, this gene was previously identified as differentially expressed between dwarf and normal whitefish (Nolte et al. 2009). Furthermore, an AFLP marker (CCTC051), itself an outlier with a Fst value of 0.871, was linked to an eQTL for Triosephosphate isomerase on linkage group 19 (Derome et al. 2008). This eQTL was located within an eQTL hotspot comprising other genes involved in various functions, including protein synthesis (growth), muscle contraction, immune system as well as a putative transcription factor (Zinc finger protein 35; figure 8). Although preliminary and requiring rigorous confirmation, these results suggest that the genomic region on linkage group 19 may correspond to an important genomic island of divergence that is maintained by natural selection despite ongoing gene flow between sympatric dwarf and normal whitefish. These results also suggest that both structural and regulatory divergence may be involved in this adaptive differentiation. This example clearly illustrates the potential and efficiency offered by the combined use of the latest genomic approaches in elucidating the genomic bases of adaptive divergence and reproductive isolation in a non-model organism, as well as deciphering the role of natural selection in driving these evolutionary processes.
Quite clearly, we are witnessing an exciting era during which many gaps of knowledge about mechanisms linking genes and selection during the course of speciation will be filled and which in turn, will further strengthen Darwin's theory of evolution by means of natural selection. Undoubtedly, non-model species, such as whitefish, studied in their ecological context will have an increasingly important role to play in expanding our knowledge in this field. There have never been more stimulating times to train the ‘next-gen’ of young evolutionary biologists in ecological genomics.
We thank Hans Ellegren for kindly conveying the senior author to contribute to this special issue, Eric Taylor for his constructive and most useful comments, and Eric Normandeau for his skilful text, table and figure editing. This research programme on whitefish ecological genomics has been supported by grants to L.B. from the Natural Sciences and Engineering Research Council of Canada (NSERC, a E.W.R. Steacie Fellowships (NSERC) as well as by the Canadian Research Chair in Genomics and Conservation of Aquatic Resources.
One contribution of 11 to a Theme Issue ‘Genomics of speciation’.
- © 2010 The Royal Society