Deciphering the genetic bases that drive animal diversity is one of the major challenges of modern biology. Although four decades ago it was proposed that animal evolution was mainly driven by changes in cis-regulatory DNA elements controlling gene expression rather than in protein-coding sequences, only now are powerful bioinformatics and experimental approaches available to accelerate studies into how the evolution of transcriptional enhancers contributes to novel forms and functions. In the introduction to this Theme Issue, we start by defining the general properties of transcriptional enhancers, such as modularity and the coexistence of tight sequence conservation with transcription factor-binding site shuffling as different mechanisms that maintain the enhancer grammar over evolutionary time. We discuss past and current methods used to identify cell-type-specific enhancers and provide examples of how enhancers originate de novo, change and are lost in particular lineages. We then focus in the central part of this Theme Issue on analysing examples of how the molecular evolution of enhancers may change form and function. Throughout this introduction, we present the main findings of the articles, reviews and perspectives contributed to this Theme Issue that together illustrate some of the great advances and current frontiers in the field.
In 1859, the publication of Darwin's The Origin of Species provided a powerful natural explanation as to how the endless forms most beautiful and most wonderful have been created on this planet. After 150 years of advances in genetics and the molecular principles of heredity, the molecular basis of life diversification can now be understood in its general terms. In the past decade, whole-genome sequencing has allowed direct observation of the genetic changes that separate related species in unprecedented breadth, helping us to understand how genetic variation is generated and opening up the possibility of grasping what genetic changes make, for example, an elephant different from a mouse.
Confirming pioneering observations , whole-genome comparisons show that protein-coding sequences and repertoire do not vary much between related organisms, with the exception of proteins related to functions like olfaction, reproduction and immune defence [2,3]. Coding exons are embedded in a sea of intronic and intergenic non-coding sequence, the vast majority of which is devoid of specific functions and constitutes ‘junk’ DNA. However, the non-coding part of the genome also includes functional regulatory regions, like enhancers. Transcriptional enhancers determine where, when and how much a protein-coding gene is expressed in every animal tissue. Because they encode such critical spatio-temporal and quantitative information, it is expected that their sequences are under strong purifying selection and mutate at a slower rate than flanking neutrally evolving regions. Even though enhancers tend to be evolutionarily conserved, in general they evolve faster than coding regions [4,5], suggesting that changes in regulatory DNA play an important role in evolution. Although this does not mean that physiological and morphological changes cannot be caused by mutations in coding exons [6,7], many characteristics of enhancers and other cis-regulatory regions indicate that organismal evolution is mostly driven by changes in gene regulation, as theorized over 40 years ago [1,8,9]. In fact, as we shall see in §§6 and 7, evidence has accumulated in recent years showing that mutations in regulatory regions are an important source of evolutionary innovation [10–13]. Although the relatively rigid genetic code that determines the amino acid sequence of proteins was deciphered soon after the discovery of DNA structure, the regulatory code of the genome still remains an unsolved mystery and, as such, understanding the complex interplay between regulatory DNA regions and transcription factors (TF), regulatory RNAs, signalling pathways and epigenetic marks that determine gene expression is among the main current challenges of modern molecular genetics. As our knowledge of the regulatory code progresses, the closer we are to understanding how evolution operates at the molecular level to bring about morphological and physiological innovation. This field of research, years ago mainly supported by those interested in the basic mechanisms of gene expression regulation, is experiencing a vigorous growth in medical and human genetics departments interested in understanding human disease, as developmental defects, cancer and other conditions are more and more found to be caused by mutations in enhancers [14–20].
In the introduction to this Theme Issue on Enhancers Evolution and Animal Diversity, we shall review some general principles concerning how changes in transcriptional enhancers happen over time and particularly how they can lead to phenotypical innovation. Some principles will be illustrated with our studies on the regulation of the proopiomelanocortin gene (Pomc; see figure 1 for a schematic of the structure of mouse Pomc and its regulatory regions). Pomc encodes a prohormone expressed in the arcuate nucleus of the hypothalamus and in the corticotropes and melanotropes of the pituitary which plays critical roles in the control of energy balance and stress response in vertebrates .
2. Enhancers: general considerations
In animals, enhancers are one of the main types of transcriptional regulatory regions, others being promoters, promoter-tethering elements, locus control regions, silencers, barrier elements and insulators. Enhancers are cis-acting segments of DNA, usually mapped down to 200–500 bp, that control expression of nearby genes. Enhancer activity is thought to depend on the long-range communication between the enhancer region and the promoter , which is achieved by DNA looping mediated by specific proteins like cohesin and the Mediator complex , as well as relocation of the active gene from the periphery to the interior of the nucleus . Identified enhancers are often located in the vicinity of the genes they control, but some have been found up to 1 Mb away , even within introns of other genes (for a recent excellent collection of papers on distal enhancers, see the Discussion Meeting Issue; ).
Since their discovery over 30 years ago , enhancers have been found to harbour several transcription factor-binding sites (TFBS) in a particular spatial order, defining what can be called the enhancer grammar. Detailed functional dissection of different enhancers led to the development of two extreme models of enhancer organization, the (i) enhanceosome and the (ii) billboard models . The enhanceosome model is derived from work on an enhancer that triggers interferon-β (IFN-β) transcription in response to viral infection. Exhaustive analyses of this enhancer found that it is composed of a tight ensemble of binding sites for several TF that need to be present in a precise order and spacing, as even minor mutations that prevent binding or alter the distance between bound TF cause the enhancer not to function at all. TF in the enhanceosome thus work synergistically as a unit . By contrast, billboard enhancers are composed of TFBS arranged in a looser and more flexible way such that the removal of a binding site diminishes enhancer performance but does not abolish it completely. The enhancer for stripe 2 of Drosophila even-skipped (eve), for instance, has an arrangement of TFBS for activators (Bicoid, Hunchback) and repressors (Krüppel, Giant) that, when individually mutated, do not abolish its function but rather change stripe width and intensity in transgenic experiments . For many enhancers analysed in less depth, there is evidence that removal of large chunks of conserved enhancer sequence does not abolish enhancer function, also pointing to a flexible, billboard-like organization, as is the case of the neuronal Pomc enhancers nPE2 and nPE1 (figure 1; [30,31]). Few enhancers have been studied at the same depth as those for IFN-β and eve stripe 2, but it is possible that for most enhancers more rigid and more flexible subsegments coexist. The 362-bp core region of the sparkling (spa) enhancer of Pax2, which drives expression to cone cells in the Drosophila compound eye, seems to have an intermediate organization, as spacing and order of TFBS are important for enhancer function, as in the enhanceosome, but at the same time one of the critical regions for spa function (region 1) can be relocated without affecting expression [32,33]. A third model of TFBS organization was found in enhancers involved in heart development and differentiation . This alternative (iii) TF collective mode of enhancer activity operates via the cooperative recruitment of a large number of cardiogenic TFs to activate enhancers with an apparently lax motif grammar, such that motifs for some factors may be absent and some necessary TF are rather recruited via cooperative protein–protein interactions .
Apart from the order and number of TFBS, another important variable in enhancer grammar is TF-binding affinity. Genome-wide analyses indicate that many interactions between TF and DNA are relatively weak, and these are regarded as likely to be non-functional  as are also DNA regions bound at low occupancy . However, in this issue, Ramos & Barolo  show that a weak interaction between the TF Cubitus interruptus (Ci) and a decapentaplegic enhancer in response to Hedgehog signalling is nevertheless very important for the correct interpretation of the Hedgehog gradient in the fly embryo, implying that low-affinity sites can be functional in some circumstances . Enhancer grammar also depends on the interactions between bound TF, and this is reflected by the conservation of distances between TFBS which is necessary to facilitate these interactions. In this issue, Guturu et al.  take into account DNA sequence and protein structural data to predict regions bound by TF on phylogenetically conserved elements in the genome, generating an original, free database of potential TF complexes for future studies on enhancer organization and function .
3. Enhancer modularity
During embryonic development and in each tissue of the adult body, different cell types express distinct groups of TF and are exposed to various cell-signalling pathways, giving rise to a unique combinatorial regulatory code that is interpreted by DNA regulatory regions, like enhancers, to ultimately determine whether a particular gene will be on or off in a given cell type. Genes with complex expression patterns have been experimentally shown to have several enhancers, each of which drives expression in particular cell types or time points. This modular organization allows each enhancer to control gene expression in a strict spatial and temporal domain independently of each other and of the basal promoter. For example, the mouse Sonic hedgehog homolog gene (Shh), which encodes a secreted morphogen, has at least six enhancers controlling its expression in several parts of the neural tube , as well as one distal enhancer controlling expression in the limb bud [14,40]. Neuronal enhancers nPE1 and nPE2 drive Pomc expression to hypothalamic neurons independently of the pituitary enhancer or promoter, and vice versa (figure 1; [41,42]). The functional modularity of enhancers is best illustrated by targeted mutagenesis experiments. For example, the removal of a distal, limb-specific enhancer of Shh causes limb truncations in mutant mice without causing aberrant phenotypes in the notochord or neural tube, which also express Shh . It is important to note, however, that enhancers may not be absolutely modular. For instance, some enhancers can synergize with each other, as the case of Shh forebrain enhancers in zebrafish . Recently, a thorough characterization of the 200 kb-regulatory landscape of the mouse Fgf8 gene, which encodes a secreted signalling protein, has shown that the relative position of enhancers in relation to the controlled gene is important to fine-tune the expression pattern of the gene . In any case, enhancer modularity has important evolutionary consequences, as mutations in a particular enhancer will change the expression of a gene in a particular region with no or minor effects in other regions, i.e. with little or no pleiotropic effects associated (see §6). Another important, emerging feature of gene regulation is that many metazoan genes possess more than one enhancer driving partially or completely overlapping expression patterns [45,46]. In flies, many developmental genes carry two enhancers with overlapping functions; the enhancer closer to the gene was called ‘primary’, whereas the one farther way was named ‘shadow’ enhancer . The existence of functionally overlapping enhancers is not restricted to developmental genes. For instance, Pomc has two distinct enhancers that drive expression to the same population of hypothalamic neurons [31,42]. Even the expression of Pomc in the pituitary, long thought to be driven only by the proximal enhancer and promoter, was recently found to depend also on an enhancer with the same regulatory activity as a proximal enhancer (figure 1; ). Functionally overlapping enhancers may confer robustness, buffering gene expression against environmental and genetic disturbances [48,49]. Enhancers with partial redundant activity are also hypothesized to be a potential source of evolutionary novelty, as such systems might be more tolerant to mutations that lead to new expression patterns .
4. Identification of enhancers and functional studies
Identifying enhancers in genomes is crucial for the understanding of the complexity and mechanisms of gene regulation . Traditionally, regulatory regions have been identified and mapped by cloning candidate sequences upstream of a minimal promoter fused to a reporter gene and testing their transcriptional activity in cell lines or in transgenic organisms. Although more laborious and expensive to make, transgenic animals have the great advantage of providing complete spatio-temporal expression information simultaneously in all tissues and cell types of an overall healthy animal. More and more laboratories map enhancers using 100–200 kb bacterial artificial chromosomes, which allow for the testing of regulatory regions in a context that better resembles the endogenous one, compared with small constructs . Detailed studies using these methods have determined that genes involved in embryonic development are controlled by several enhancers, like the example of the mouse Shh gene mentioned in §3 or the chicken Sox2 gene encoding a TF controlled by at least 11 enhancers arranged in a 50 kb region surrounding the gene . This is not surprising because the expression of developmental genes need to be precisely controlled both spatially and temporally in many different tissues. However, even genes not involved in development may have several enhancers, like Pomc, which is controlled by two distal enhancers in the hypothalamus and by one distal and one proximal enhancer in the pituitary (figure 1; [41,42]). These and many other examples indicate that the number of enhancers in every animal genome is substantially larger than the number of protein-coding genes.
Beginning in the early 2000s, the sequencing of the genomes of several species allowed for the identification of enhancers and other non-coding DNA elements by phylogenetic footprinting . This technique is based on the idea that DNA sequences that play a functional role evolve slower than non-functional sequences, which are freer to accumulate neutral mutations . Thus, comparing the genomes of an appropriate set of organisms allows one to identify DNA elements under evolutionary constraint and differentiate them from neutrally evolving sequences. Conserved non-coding sequences (CNE), when experimentally tested, often turn out to display enhancer activity [55–58]. Deeply conserved CNE (i.e. conserved in all vertebrates) are more often found around genes involved in early development, presumably because mutations in enhancers that control the precise regulatory patterns of such genes often cause deleterious developmental phenotypes [59,60]. Recently, the sequencing of 29 mammalian genomes has allowed for the mapping of evolutionary constraint in great detail, showing that 4.2% of mammalian genome sequence is phylogenetically conserved . Within the conserved fraction, around 68% is located in intronic or intragenic regions (i.e. outside exons and promoters), and at least 30% of the conserved fraction overlaps chromatin marks typical of enhancers . Some CNE display extreme levels of conservation (ultraconserved sequences or UCE), reaching 100% sequence identity in mammals , and some orthologous CNE have even been found to be present in invertebrates and vertebrates [5,63]. The challenges posed by the study of CNE, in particular the study of their evolutionary dynamics and function, are the topic of a review by Harmston et al. in this issue , while another review by Maeso et al.  deals with the challenge of identifying and analysing transphyletic CNE.
In recent years, chromatin immunoprecipitation coupled to microarray hybridization (ChIP-CHIP) or DNA sequencing (ChIP-seq) has allowed for the genome-wide mapping of TFBS and the mapping of chromatin (‘epigenetic’) marks in the form of specific histone posttranslational modifications [66,67]. Tracing the distribution of such features—as well as regions of open chromatin detected by whole-genome mapping of DNase I hypersensitive sites—can be used to identify potential enhancers at a genome-wide scale [68,69]. Thus, enhancer regions are bound by clusters of TF [70–72] and are often associated with transcriptional cofactors p300/CBP (a histone acetylase) and components of the Mediator complex [23,73]. Active enhancers are also associated with specific histone marks like histone H3 lysine 4 monomethylation (H3K4me1) and histone H3 lysine 27 acetylation (H3K27ac), as well as depletion in H3 lysine 4 trimethylation (H3K4me3), which mark promoters (reviewed in [60,61]). In the human genome, the ENCODE Project identified around 400 000 elements bearing enhancer-like chromatin signatures in the cell lines that were analysed , while around 230 000 potential enhancers were found in the mouse genome using similar techniques . The utility of sequence conservation as well as whole-genome ChIP techniques to identify and study the evolution of enhancers is reviewed in this issue by Sakabe & Nobrega .
It is important to note, however, that the evidence provided by the genome-wide mapping of genomic features is just an indication of potential enhancer activity and should not be taken as equivalent to regulatory function [77,78]. Indeed, it is expected that the large genome of complex organisms is alive with biochemical activity, including TF binding, with no significant physiological consequences for the organism [77,79–81]. As an indication of this, it has been recently shown that DNA sequences generated at random can display a significant degree of transcriptional regulatory activity in mammalian cells , indicating that even expression assays in cells or transgenic models may be deceptive and that more detailed experimental evidence is necessary before assigning enhancer function to a sequence. Around 1 kb upstream of mouse Pomc, we have serendipitously found a non-conserved regulatory region which drives reporter expression to the subgranular layer of the dentate gyrus of the hippocampus of transgenic mice (figure 1; ) and is being used as a reliable marker of newborn neurons in this brain region . Because Pomc is not expressed in the hippocampus, it may well be that this enhancer activity is used by another gene in the vicinity or just represents a consistent artefact, especially considering that this DNA sequence is not conserved in other mammals.
5. Molecular evolution of enhancers
Although purifying selection keeps transcriptional enhancers evolving at a slow pace in relation to non-functional DNA, enhancers do change by the accumulation of point mutations and small insertions and deletions. In addition, new enhancers can appear by chance when random mutations create clusters of TFBS or lost when large deletions eliminate enhancer sequences. Such processes can be driven by natural selection as well as purely neutral mechanisms . Ultimately, turnover of TFBS in enhancers as well as the birth and elimination of enhancers cause the regulatory element repertoire to change over evolutionary time. An indication of this is that while a third of the coding bases are conserved between mammals and amphioxus (a chordate that belongs to a sister group of vertebrates), less than 1% of CNE bases (a proxy for enhancers) are conserved between these groups . Thus, TFBS reshuffling and enhancer turnover seem to be two pervasive mechanisms setting up hurdles on the way towards the discovery of a universal transcriptional regulatory code, in contrast to the straightforward translational code unravelled in the 1960s that allows predicting protein identity directly from DNA sequence. The variety of rules already found in the regulatory code are perplexing, as illustrated by the titles of two recent papers on the subject: the review ‘Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same’  and the research paper ‘Minor change, major difference: divergent functions of highly conserved cis-regulatory elements subsequent to whole genome duplication events’ . If the former addresses the concept that the regulatory logic of a functional enhancer can be retained in the absence of detectable sequence conservation, the latter indicates that even small mutations present in highly conserved enhancers may lead to profound functional differences. As understanding the transcriptional code and its impact on the evolution of form and function will be more challenging than anticipated, efforts must be redoubled to study how gene expression is modified by enhancer birth, change or loss.
(a) The birth of an enhancer
Genetic mechanisms leading to the birth of novel enhancers include: (i) insertion of transposable elements (TEs), (ii) de novo mutations and (iii) chromosomal rearrangements that promote enhancer adoption. The high contribution of TE-derived sequences to genome composition provides a substantial amount of raw material with the potential to evolve into novel functional cis-regulatory elements. Probably the first convincing case was a sequence derived from a HERV-E retroposon that became co-opted as a parotid-specific enhancer of the human salivary α-amylase 1C gene (AMY1C; ). Interestingly, insertion of this TE in the 5′ proximal flanking region of a paralogue copy of the ancestral pancreatic AMY2B occurred in the lineage leading to primates  which acquired, therefore, the possibility to taste rewarding sweet sugars produced by the enzymatic processing of starch. More recently, the 29 Mammals Project has found more than 280 000 conserved, TE-derived non-coding elements [61,90]. Although these elements carry potential cis-regulatory function, exaptation (co-option) of TE-derived sequences into transcriptional enhancers has been rigorously demonstrated in transgenic animals in only a handful of cases . For example, we have found that the neuronal Pomc enhancers nPE1 and nPE2 originated from two independent exaptation events, an example of convergent evolution of transcriptional enhancers [31,91], whereas nPE2 derived from a CORE-SINE retroposon  in the lineage leading to mammals, nPE1 is a more recent placental acquisition derived from a mammalian apparent LTR (MaLR) retroposon .
Eichenlaub & Ettwiller  have recently discovered that enhancers can originate de novo after acquiring minor changes in previously existing non-regulatory sequences . By taking advantage of the massive gene loss following the last whole-genome duplication in teleosts, these authors identified four ancient exons that lost their coding capacity in teleosts and were exapted as transcriptional enhancers of nearby genes, as demonstrated in transgenic medaka embryos . Gene conversion may promote the adoption of an enhancer previously used by another gene, as shown by the group of Mike Levine in the beetle Tribolium castaneum in which a cardiac enhancer located in the 3′ flanking region of the ladybird gene was relocated to the 5′ end of the neighbouring gene C15. This inversion, therefore, promotes the cardiac expression of C15 in Tribolium .
(b) Enhancers change, function not always
That enhancers can change without modifying their regulatory activity is best illustrated by the even-skipped stripe 2 enhancer (S2E) in species of the Drosophila genus, in which the shuffling of TFBS within the S2E of different species leads to the same expression pattern in transgenic Drosophila melanogaster embryos [94,95]. Even the orthologous eve enhancers of sepsid flies, which are distant relatives of Drosophila and whose enhancer sequences are very different, can drive equivalent expression in transgenic D. melanogaster embryos , showing that enhancers with different arrangements of TFBS can correctly interpret a TF code and give rise to the same expression output. Importantly, in complementation experiments, the divergent Drosophila pseudoobscura S2E perfectly rescued the embryonic lethal phenotype of mutant D. melanogaster carrying homozygous deletions of S2E, proving that two enhancers with highly different sequences can be functionally interchangeable, at least when tested in the laboratory . Thus, stabilizing selection is reminiscent of the politically cynical ideas of the Sicilian Prince of Salinas who in the novel ‘The Leopard’ by Guisseppe Lampedusa published in 1957 stated that ‘everything needs to change, so everything can stay the same’. In vertebrates, this scenario is illustrated by experiments in which mammalian enhancers can drive appropriate expression in zebrafish embryos even in the absence of obvious sequence identity. For example, Fisher et al.  analysed human enhancers of the RET locus in zebrafish and found that 11 out of 13 drove equivalent reporter gene expression in zebrafish, even though no orthologous zebrafish enhancers could be found by sequence comparisons. In this issue, Domené et al.  show that the mammalian Pomc enhancers nPE1 and nPE2 drive expression to POMC neurons of zebrafish embryos, even though their exapted origin from TEs in the early stages of the mammalian radiation [30,31] rules out the possibility that they are orthologous to the teleost enhancers.
Contrarily to the cases described above, there are examples in which conservation and high sequence identity may be functionally deceiving as has been reported for the zebrafish Shh enhancers ar-D [5′ proximal], ar-A (intron 1) and ar-C (intron 2) that are highly conserved at the sequence and location level with their corresponding mouse orthologues, but which direct expression to different expression territories in each animal, demonstrating that the function of the structurally conserved enhancers has diverged during vertebrate evolution . Similarly, but at the paralogue level, the ar-C enhancers of the fugu shha and shhb genes carry minor sequence changes that, however, modify the expression pattern of reporter genes in transgenic zebrafish embryos .
Hox genes also provide a great setting where to look for enhancer evolution between duplicate paralogues. Half a billion years ago, the ancestral Hox gene cluster became quadruplicated in the vertebrate lineage, generating novel Hox genes paralogous after two consecutive events of tetraploidization. Hoxa1 and Hoxb1 are indispensable for proper hindbrain segmentation as has been demonstrated in individual knockout mice. Analysis of these mutant mice showed that no active paralogue was able to compensate for the lack of the loss-of-function copy, suggesting subfunctionalization. Indeed, it is known that Hoxa1 lost its ability to autoregulate its expression levels, whereas Hoxb1 lost the responsiveness to retinoic acid. In a reverse evolution experiment, Tvrdik & Capecchi  reconstructed the ancestral Hox1 gene by inserting the autoregulation enhancer of Hoxb1 into the 5′ flanking region of Hoxa1 in the context of a Hoxb1 null-allele mutant. The resulting enhancer knockin/gene knockout mice surprisingly showed normal development, demonstrating that subfunctionalized gene paralogues can be reset to the primitive state and replaced with a single copy provided that both paralogue proteins retain equivalent activities.
Of course, enhancer evolution can also lead to divergent expression patterns, possibly driving phenotypical innovation. The introduction of novel TBFS into pre-exisiting functional enhancers may allow a gene to acquire expression in an additional spatio-temporal domain without affecting its ancestral expression pattern. Rebeiz et al.  reported that the neprilysin-1 gene (Nep1) of Drosophila santomea has gained a novel expression pattern in optic lobe neuroblasts after accumulating four mutations near a pre-existing intronic enhancer responsible for driving Nep1 expression to ancestral areas such as the ventral ganglion and the retinal field. In this issue, Glassford & Rebeiz  expand on their previous work by systematically testing the mutational paths that led to the accumulation of these four mutations. Interestingly, their results indicate that some of the paths were prohibited due to fitness costs associated with epistasis .
(c) Enhancers come and go
Enhancers are not only modified or created de novo but may also be lost during the course of evolution as will be further discussed in §6a. Bejerano and co-workers  found that hundreds of conserved non-coding genomic regions are independently lost in distinct mammalian genomes, raising the possibility that some of them could be involved in lineage or species-specific morphological variation. The existence of functionally overlapping or near redundant enhancers provides a genetic substrate for this type of mechanism to occur; the surprising lack of detectable differential phenotypes in four different strains of mutant mice lacking ultraconserved enhancers  exemplifies that animals may lose sequences selected for millions of years without notable deleterious effects, at least in the comfortable environment of a laboratory animal facility.
Another interesting evolutionary mechanism that induces loss of function of tissue-specific enhancers is the appearance of a novel transcriptional repressor that silences previously active enhancers, as has been reported in a recent study of the molecular evolution of the vertebrate paralogues pax2 and pax8 . In Xenopus laevis tadpoles, pax2 is expressed in several cell-types, whereas pax8 is expressed only in a subset of pax2-expressing tissues. Despite this difference, pax2 and pax8 share several paralogous, conserved non-coding elements that, individually, recapitulate a pax2-like reporter expression pattern when ligated upstream of a minimal β-actin promoter and tested in transgenic tadpoles. Surprisingly, when the β-actin promoter was replaced by the Xenopus pax8 proximal promoter, the pax2 and pax8 enhancers drove a more restricted pax8-like expression pattern, demonstrating that the ancestral enhancers of pax2 and pax8 are equally able to direct pax2-like, multi-tissue expression, except when placed upstream of a silencer present in the pax8 proximal promoter that suppresses expression outside the pax8-expressing tissues .
6. Enhancer evolution and animal diversity
What about real data indicating that mutations in enhancer sequences underlie evolution of important traits? The number of detailed studies of phenotypic diversification driven by enhancers is still very small, and the difficulty in setting up such experiments lies in the necessity of having (i) an easy-to-spot phenotype that displays intra- or interspecific variation; (ii) knowledge of gene(s) that control(s) such phenotype; (iii) detailed data on the DNA sequences (promoter and enhancers) that regulate the genes in the relevant organ/structure; (iv) sequence information from related species or populations in which the phenotype shows a relevant variation and (v) an amenable experimental system with which to perform functional studies with the different versions of the enhancer, usually by transgenesis.
Owing to this complexity, most studies on the importance of cis-regulatory variation in species evolution and adaptation do not go very deep. Phenotypic variation at many interesting traits is known to be controlled by cis-regulatory divergence, but usually the DNA sequences behind the variability are unknown. For instance, it is well established that colour pattern in wings of Heliconius butterflies is linked to divergent regulation of the optix gene, but the cis-regulatory sequences are still to be found [106,107]. In other examples, nucleotide polymorphisms linked to a trait are known but their regulatory role has not been analysed in great detail. One example of the latter case is lactase expression in human populations. Several pastoralist human populations around the world have independently evolved the capacity to express lactase during adulthood, which allows digestion of milk and dairy products. Many polymorphisms linked to the persistence of lactase expression in adults are known, all of them located on an intron of an adjacent gene . The intronic region is conserved only in primates and displays enhancer activity in cell culture assays, but a detailed characterization of lactase regulatory elements and its variants in a more physiological model, like transgenic mice, is still lacking [108,109]. In other cases, suggestive differences in enhancer sequence and activity between species are known, but the derived expression pattern cannot be linked to a trait.
A detailed understanding of the relation between enhancer evolution and trait variation is only possible by series of studies of a particular locus in several species or populations. Table 1 shows a list of studies that provide compelling evidence as to how enhancer sequence evolution has contributed to specific traits in animals. The examples illustrate various mechanisms that lead to divergence of enhancer function: deletions, modification in the TFBS repertoire and acquisition of new enhancer regions.
(a) Enhancer deletion and phenotypic variation
A dramatic mechanism of enhancer evolution is simply the deletion of an enhancer, which leads to the loss of gene expression in a particular region of the body. A good example concerns a pelvic enhancer of the Pitx1 gene in the three-spined stickleback, a teleost fish that lives in the sea as well as in North American and Northern European freshwater lakes. Marine sticklebacks possess bony spines in the pelvic region, presumably as protection from predators; freshwater populations, on the other hand, usually lack large pelvic armour. Development of such spines depends on the activity of a TF, Pitx1, which is expressed in the pelvic region under the control of a specific enhancer [111,112]. Several freshwater stickleback populations lack spines due to deletions of the pelvic enhancer; interestingly, deletions have happened independently several times in isolated populations of freshwater sticklebacks [111,112].
Other likely cases of enhancer deletions during evolution have been uncovered by McLean et al. , who identified 509 instances of non-coding DNA regions which are conserved in mammals but have been deleted since the split between humans and chimpanzees. One of the sequences absent in humans is an enhancer of the androgen receptor (AR) gene that drives expression to vibrissae (sensory hairs in the face) and the genital tubercle during development . Testosterone is necessary for sensory vibrissae growth and for the development of penile spines, structures that are present in rodents and non-human primates but absent in our species. Thus, a reasonable hypothesis is that the loss of an AR enhancer in our close ancestors led to morphological change in our lineage . Thus, the work on teleost Pitx1 and mammalian AR shows that deletion of enhancers that control the expression of a TF in a particular region can cause evolution to happen by large steps, eliminating whole morphological structures provided that the lack of such structures (like pelvic and penile spines) are not selected against. Another enhancer deletion with potential functional significance uncovered by McLean et al.  affects the expression of the tumour-suppressor gene GADD45G, which might be related to human-specific neuronal proliferation in the forebrain.
(b) Evolution by mutations in TFBS
Another, more subtle way of enhancer evolution is via point mutations and small indels that either create or eliminate TFBS within an enhancer, thereby changing its activity. Several enhancers that control pigmentation in flies of the genus Drosophila are known to have evolved in this way. The yellow gene, which encodes an enzyme involved in pigment synthesis, is expressed in the Drosophila wing under the control of the spot enhancer . In D. melanogaster, yellow expression is low and steady throughout the wing, leading to uniform pigmentation. In Drosophila biarmipes, in contrast, yellow is highly expressed in a corner of the wing, creating a dark spot . Dark spots on the wings may have a function during wooing, when males execute an elaborate dance to females. Analyses of enhancer sequence differences and transgenic assays indicate that the spot enhancer in D. biarmipes has sites for a transcriptional activator (recently identified as Distalless)  as well as for the repressor Engrailed, the latter responsible for setting the posterior boundary of the yellow spot . In a subsequent work, Prud'homme et al.  checked for the presence of wing spots throughout Drosophila phylogeny and found that independent gains and losses of the wing spot have happened a couple of times during the evolution of the genus. Loss of the wing spot in Drosophila gunungcola and Drosophila mimetica happened by mutations in the same spot enhancer, in a case of convergent phenotypic evolution driven by change in the same regulatory element . Notably, gain of wing spot in Drosophila tristis, which primitively lacked a spot, happened not by mutations in the spot enhancer, as in D. biarmipes, but in another enhancer located in a yellow intron that directs expression to wing veins . Thus, a remarkably similar phenotype—the dark spot on the upper right corner of the wing—was independently generated by the co-option of two different yellow enhancers during fly evolution .
Yellow is also involved in the male-specific pigmentation of the abdomen, where its expression is controlled by the body enhancer . In D. melanogaster and D. pseudoobscura, body enhancer activity depends on the binding of TF Abdominal-B (Abd-B), but the TFBS responsible for binding have been lost in the body enhancer of Drosophila kikkawai, which lacks abdominal pigmentation . In another species lacking abdominal pigmentation, D. santomea, it is the expression of tan that is altered . tan is an enzyme with functions in pigmentation and vision. Its expression depends on an enhancer that is conserved in the Drosophila genus, but in D. santomea at least three different kinds of mutations, including point mutations and deletions, render the enhancer non-functional in the abdomen, which stays unpigmented . Also variation in an enhancer of ebony, which encodes an enzyme related to pigmentation, was found to be related to abdominal pigmentation in African D. melanogaster populations .
Another variable phenotype in flies which has been studied in great depth is trichome development in Drosophila larvae. The precise pattern of trichome distribution depends on several enhancers that control the expression of the TF shavenbaby (svb), as illustrated by Stern & Frankel  in this issue, who summarize 13 years of continuous work on the evolution of shavenbaby expression in several Drosophila species. Quarternary trichome loss in Drosophila sechellia larvae is due to several (at least five) point mutations in one particular enhancer, E6, which reduce svb expression in the area of the larvae that gives rise to quartenary trichomes . Interestingly, each mutation in the E6 has a small effect per se, so that significant reduction is svb expression is only achieved when all sites are mutated. Thus, loss of enhancer activity can be brought about not only by enhancer deletion but also by the accumulation of point mutations with small effect, possibly to avoid pleiotropic effects caused by complete enhancer loss .
In tetrapods, limb anatomy and length is partially controlled by the TF Prx1. Forelimbs in bats are much longer than those of mice, and Cretekos et al.  showed that replacing a mouse Prx1 limb enhancer with the orthologous enhancer from a bat caused the limb of mutant mice to be a little longer, indicating that mutations in the enhancer alter its activity and contribute to the evolution of forelimb anatomy . Another recent example of enhancer mutations driving innovation is the work by Guerreiro et al.  on the axial anatomy of snakes. Vertebrae in tetrapods can usually be subdivided into several types (cervical, thoracic, lumbar, sacral and caudal). Snakes, on the other hand, are characterized by a homogeneous vertebral anatomy, with a great number of thoracic-like, ribbed vertebrae. In vertebrates, rib formation at posterior levels of the column is repressed by TF Hoxa10, but in snakes this repression does not take place . Studying an enhancer of Myf5, a gene necessary for proper axial development, Guerreiro et al.  identified sites that can bind to TF Pax3 and Hoxb6, which act as activators, and Hoxa10, which is a repressor . In snakes, there is a single nucleotide substitution in the enhancer that prevents Hoxa10 binding without interfering with activator binding, which likely explains why ribbed vertebrae are formed in snakes despite of Hoxa10 expression not being altered in this group [116,129].
(c) Acquisition of new enhancers and expression territories
During evolution, it is certain that new regulatory activities originate not only by tinkering with pre-existing enhancers but also by the appearance of new enhancers. Patterns of CNE distribution suggest that new enhancers played a role in several stages of vertebrate evolution , while a burst of co-option of transposons as new cis-regulatory elements is associated with the reproductive innovations of placental mammals [131,132]. Recently, mapping of enhancer chromatin marks (H3K27ac) at equivalent stages of limb development in mouse, macaque and human uncovered many putative enhancers which seem to be active only in the developing human limb and which might contribute to the specific limb phenotype of our species .
An important event in the history of vertebrates was the morphological transition from a fin to a limb in the ancestor of tetrapods. Expression of genes of the Hoxd group are more intense in the developing limbs of tetrapods compared with developing fins of fish, and it has been hypothesized that this feature might be related to the differences in the morphology of these structures. Recently, Freitas et al.  found that overexpression of Hoxd13 in the zebrafish developing fin causes increased proliferation of the chondroskeleton, which acquires anatomical and molecular characteristics similar to a tetrapod fin. In addition, they show that a tetrapod-specific Hoxd enhancer, CsC, drives robust expression to the fins of transgenic zebrafish, showing that the trans-acting factors ready to overexpress Hoxd were in place before the appearance of the tetrapod limb [115,134].
Many of the studies described above that pinpoint enhancer mutations leading to morphological innovation began by mapping a large-effect genetic locus responsible for variation in closely related species or populations [135–138]. Mapping may also detect genomic regions with signs of a selective sweep, an indication that natural selection is fixing a variant in the population. In this issue, Glaser-Schmitt et al.  describe a selective sweep around an enhancer of the CG9509 gene (encoding an enzyme of unknown function) that is responsible for a consistently higher expression of this gene in European versus sub-Saharan African populations of D. melanogaster. Increased expression of the enzyme outside sub-Saharan Africa likely indicates that this phenotype has fitness value, perhaps because the enzyme may have detoxifying functions .
7. Enhancer evolution in humans
In their seminal paper, King & Wilson  observed that chimpanzee and human proteins were exceptionally similar in sequence and deduced that the differences between these species should mainly be due to mutations in regulatory DNA. Testing this idea is not an easy task due to the lack of appropriate model organisms to manipulate , but the work of McLean et al.  discussed above indicates that, at least for some traits, highly suggestive experimental evidence can be obtained for the identification of human-specific, non-coding mutations that helped mould our species. In this regard, sequences displaying signs of accelerated evolution in humans often have enhancer activity and might be responsible for human-specific phenotypic adaptations . One example is element HACNS1, a limb enhancer showing evidence of accelerated evolution in the human lineage . In transgenic mice, human HACNS1 shows an increased transcriptional activity compared with the chimpanzee and macaque orthologous enhancers , probably due to mutations that prevent the binding of repressors of the gene . Although it has been hypothesized that this might be linked to the evolution of limb morphology, the gene controlled by HACNS1 is unknown and no experimental evidence for a function in limb development is available . In this issue, Capra et al.  combine data on non-coding human accelerated regions with genome-wide surveys of chromatin marks and binding of transcription factors and cofactors to identify potential enhancers. Experimental testing of a set of human and chimpanzee enhancer orthologues indicates that most can work as enhancers in transgenic mice and, consistent with the evidence for positive selection in humans, many of them have different regulatory activity between the species . These human accelerated regulatory sequences, together with other putative human-specific enhancers found in genome-wide surveys [142,144], are a rich dataset for the discovery of the peculiar gene regulation features that make us human. Taking advantage of some of these databases, the group of Lucía Franchini identified the genes carrying the highest number of human-specific accelerated sequences . The developmental brain gene NPAS3 is at the top of this ranking with up to 14 elements that are highly conserved in mammals, including primates, but carry human-specific nucleotide substitutions. In this issue, Kamm et al.  study the accelerated element 2xHAR142 present in intron 5 of human NPAS3 and show that the mouse and chimp 2xHAR142 orthologues behave as transcriptional enhancers in transgenic mice driving lacZ expression to similar regions of the central nervous system where mouse Npas3 is normally expressed. Interestingly, the human 2xHAR142 orthologue extends the area of lacZ expression to the developing anterior telencephalon, providing an example of human-specific heterotopy promoted by an accelerated transcriptional enhancer that could have contributed to the characteristic enlargement of this brain area in humans .
8. Concluding remarks
Deciphering the genetic mechanisms that operated during the past 650 Myr of animal evolution to create the seemingly infinite variety of animals that populate our planet has been one of the holy grails in biology. However, it is only recently that scientists have developed a number of experimental tools that allow study of these problems that were much more difficult to tackle in the pre-genomic era (before ca 2005). The idea of putting together this Theme Issue comes at a critical time in which the massive availability of genome sequences from several species, coupled to single- and multiple-gene functional studies in transgenic organisms and sophisticated maps of chromatin features have produced an unprecedented impact in the identification of potential cell-type-specific enhancers with functional roles in different lineages or species. We are just beginning to understand the different possible mechanisms by which enhancers evolve and contribute to novel forms and functions. The limited number of examples documented in the literature, most of which are listed above, are certainly but the tip of the iceberg, as hundreds of thousands of candidate enhancer sequences have been identified by sequence conservation and enhancer-associated chromatin marks in many different animal genomes. However, to make sense of whole-genome surveys and bioinformatic predictions, it will be important to further develop the methods to evaluate regulatory function in vivo, because the data derived from genome-wide studies are descriptive in nature. In this regard, novel techniques have been recently developed to test the regulatory activity of DNA elements at a large scale (see for instance ref. ), something that should help functional studies to keep pace with genome-wide surveys. In addition to reporter gene assays, it will also be desirable to accelerate the study of enhancers by loss-of-function assays, which is perhaps the best way of evaluating the function of a DNA sequence. In recent years, promising novel genome-editing techniques based on the use of prokaryotic CRISP/Cas9 nuclease [146,147] and the transcription activator-like effector nucleases [148–150] have emerged and may be rapidly adapted to make loss-of-function studies of enhancers in several vertebrate and invertebrate species, including those organisms in which regulatory regions knockouts were, until now, not amenable, as have been recently achieved in zebrafish [148,151,152] and medaka . The use of these novel techniques will certainly accelerate discoveries in the field of enhancer evolution and animal diversity.
The articles, reviews and perspectives that follow this introduction shed more light in the still murky and mysterious world of gene expression evolution in the animal kingdom. Each of these papers, in one way or another, consolidates the idea that there will probably be no fixed law, like gravity, to explain at the molecular level how endless forms most beautiful and most wonderful have been, and are being evolved. It rather seems that a wide variety of peculiar molecular mechanisms perform, together, the complex task of putting the genome in action, in each cell type of each animal species, at every moment in life and under every possible physiological and environmental circumstance.
This work was supported by an NIH grant no. DK068400 (M.R.), Agencia Nacional de Promoción Científica y Tecnológica, Argentina (M.R. and F.J.S.) and Universidad de Buenos Aires (M.R.).
One contribution of 12 to a Theme Issue ‘Molecular and functional evolution of transcriptional enhancers in animals’.
- © 2013 The Author(s) Published by the Royal Society. All rights reserved.