CTCF has it all. The transcription factor binds to tens of thousands of genomic sites, some tissue-specific, others ultra-conserved. It can act as a transcriptional activator, repressor and insulator, and it can pause transcription. CTCF binds at chromatin domain boundaries, at enhancers and gene promoters, and inside gene bodies. It can attract many other transcription factors to chromatin, including tissue-specific transcriptional activators, repressors, cohesin and RNA polymerase II, and it forms chromatin loops. Yet, or perhaps therefore, CTCF's exact function at a given genomic site is unpredictable. It appears to be determined by the associated transcription factors, by the location of the binding site relative to the transcriptional start site of a gene, and by the site's engagement in chromatin loops with other CTCF-binding sites, enhancers or gene promoters. Here, we will discuss genome-wide features of CTCF binding events, as well as locus-specific functions of this remarkable transcription factor.
CTCF is a ubiquitously expressed and an essential protein , and is, in many ways, an exceptional transcription factor. It was first described as a transcriptional repressor , but was also found to act as a transcriptional activator [3,4]. Most strikingly, it harbours insulator activity: when positioned in between an enhancer and gene promoter, it can block their communication and prevent transcriptional activation [5–7]. Systematic chromatin immunoprecipitation experiments combined with high-throughput sequencing (ChIP-seq) have been performed to map CTCF binding events across the genome in many tissues of different species [8–10]. They show that the genome is covered with a myriad of CTCF binding sites. More than most other transcription factors CTCF appears to bind to intergenic sequences, often at a distance from the transcriptional start site (TSS) . CTCF was one of the first proteins demonstrated to mediate chromatin looping between its binding sites [12,13]. Further evidence for its role in the organization of genome structure comes from observations that it frequently binds to boundaries between chromosomal regions that occupy distinct locations in the nucleus, to boundaries between regions with different epigenetic signatures and/or different transcriptional activities, and to boundaries between recently identified topological domains, which are spatially defined chromosomal units within which sequences preferentially interact with each other [14,15]. Here, we will discuss studies on CTCF and evaluate its function in genome folding and gene expression.
2. CTCF at the β-globin and the H19–Igf2 locus: a short history
Functions of the versatile DNA-binding protein CTCF were initially explored at individual loci, in particular at the β-globin locus and the imprinted H19–Igf2 locus. The chicken β-globin locus carries a DNaseI hypersensitive site (5′HS4) at its 5′ side that separates the locus from neighbouring heterochromatin and this site was found capable of blocking enhancer activity . CTCF was subsequently demonstrated to be responsible for this insulator activity of 5′HS4 . The human and mouse β-globin loci are also located inside large chromosomal regions of inactive chromatin and are similarly flanked by CTCF-binding sites [17,18]. These were suspected to form a barrier for incoming heterochromatin, but their deletion did not lead to closing or inactivation of the β-globin locus [12,19]. The application of chromosome conformation capture (3C) technology enabled the demonstration that the β-globin CTCF sites physically interact with each other. They form large chromatin loops encompassing the β-globin main regulatory element, the locus control region (LCR), and its genes. These loops are erythroid-specific and are formed in erythroid progenitor cells, prior to LCR-mediated high expression of the β-globin genes (figure 1a; [12,20]). It was speculated that the CTCF loops can facilitate subsequent spatial interactions between the LCR and its target genes, but evidence for this is still lacking.
Another locus historically important for CTCF's reputation as an interesting transcription factor is the imprinted H19/Igf2 locus. The locus contains a differentially methylated region that is known as the imprinting control region (ICR), located in between the H19 and the Igf2 genes. The ICR determines that H19 is active on the maternal allele and that Igf2 is transcribed from the paternal allele [21,22]. CTCF entered the stage here when it was found to bind to the ICR in a methylation-dependent manner: the binding of CTCF to the unmethylated maternal ICR prevents shared enhancers near the H19 gene from reaching across and activating Igf2. On the paternal allele, CTCF cannot exert its insulator activity as DNA methylation prevents its binding to the ICR (figure 1b; [6,23]). Again, chromatin loops are formed and seem important for ICR functioning [24–26]. Allele-specific chromatin loops with both enhancers and promoters are formed by the maternal, CTCF-bound ICR, suggesting that such contacts may underlie CTCF-mediated insulator activity . Collectively, the early studies on CTCF functioning at the β-globin and the H19–Igf2 locus revealed that the protein can interfere with promoter–enhancer communication. They also showed that CTCF can form chromatin loops between its binding sites, and perhaps also with other regulatory sequences.
3. CTCF binds across the genome to chromatin boundaries, enhancers and gene promoters
The systematic mapping of genome-wide binding sites by ChIP revealed that CTCF binds to tens of thousands of genomic sites [10,11,27]. Association to roughly one-third of these sites is relatively conserved across different cell types . An inter-species comparison between CTCF binding profiles in the liver of five mammalian organisms uncovered approximately 5000 sites that are ultra-conserved between the species and tissues. These appear to be the high-affinity binding sites, suggesting that differences in affinity could be related to the strength of conservation . The activation of retro-elements has produced species-specific expansions of CTCF-binding sites, and this form of genome evolution is still highly active in mammals . Classification of CTCF binding sites based on a consensus motif score lead to similar conclusions: high occupancy sites appear to be conserved across cell types, whereas low occupancy sites are more tissue restricted .
The CTCF consensus binding sequence contains CpG and can, therefore, be subject to DNA methylation. CTCF is able to bind to methylated DNA sequences in vitro , but preferentially binds to unmethylated sequences, as seen also at the H19–Igf2 locus. In fact, DNA methylation appears to play a role in some of the tissue-specific binding events of CTCF . Moreover, CTCF can influence DNA methylation by forming a complex with two enzymes related to DNA methylation: poly(ADP-ribose) polymerase 1 (PARP1) and the ubiquitously expressed DNA (cytosine-5)-methyltransferase 1 (DNMT1). CTCF activates PARP1, which then can add ADP–ribose groups to DNMT1 to inactivate this enzyme, with maintenance of methyl-free CpGs as the result [30–32].
A portion of CTCF binding sites is found enriched at transitions between active chromatin (high in H2K5Ac) and inactive chromatin domains (high in H3K27me3) [27,33]. This seems particularly true for retrotransposed CTCF binding sites . CTCF sites frequently flank the so-called lamina-associated domains (LADs). LADs are chromosomal regions associated with the lamin-based protein network that coats the inner side of the nuclear envelope; these chromosomal regions tend to be transcriptionally inactive . Its presence at LAD boundaries suggests that CTCF helps to organize the three-dimensional structure of chromatin. In Drosophila, the knockdown of CTCF leads to decreased levels of H3K27me3 inside inactive domains, indicating that CTCF binding at boundaries is required for the maintenance of repression . Association of CTCF follows the resetting of active and inactive domains during cellular differentiation, further suggesting that it functions to separate different chromatin states . Some of the LADs also dynamically change during cellular differentiation , but whether CTCF binds to the borders of these differential LADs is currently unclear.
Although CTCF binding is often found distal to TSSs, it does show a strong correlation with gene density (figure 2a,b) . Indeed, evidence for a direct role of CTCF in transcription regulation came from early studies on individual genes [3,37]. Genome wide, a portion of CTCF sites co-localize with the promoter-specific H3K4me3 mark and another part coincides with the enhancer mark H3K4me1 . CTCF binding events at promoters tend to be conserved across tissues, whereas CTCF binding to enhancers is more tissue restricted .
4. CTCF and cohesin share DNA binding sites
An unanticipated observation was the co-localization of cohesin with many of the chromosomal binding sites of CTCF [38–42]. Cohesin has always been associated with DNA replication and sister chromatid cohesion during the S, G2 and M phase of the cell cycle . It is a protein complex that contains members of a family of ‘structural maintenance of chromosomes’ proteins. The complex forms a ring-like protein structure that is thought to embrace two DNA helices. Surprisingly at the time, cohesin was also found to bind chromatin in post-mitotic cells, with half of its binding sites overlapping with CTCF sites [38–40]. Cohesin association to these sites is dependent on the presence of CTCF: without CTCF, cohesin still binds to chromatin but is no longer found at specific sequences. In contrast, CTCF does not rely on cohesin for finding its DNA binding sites. One possibility is that bound CTCF serves as a roadblock or barrier to position a sliding cohesin molecule on the chromatin template [38–42].
Its cell-cycle independent association to DNA suggests that cohesin has an additional role in gene regulation. Given its capacity to hold together two sister chromatids, cohesin is obviously also attractive as a looping factor. Indeed, at the H19–Igf2 locus, cohesin was shown to be important for CTCF-mediated chromatin loop formation and proper regulation of Igf2 transcription . Similarly, at the interferon gamma (IFNG) locus, depletion of cohesin was found to disrupt chromatin loops between regulatory DNA sequences and cause a reduction in IFNG expression . Also at the β-globin locus cohesin has been implicated in chromatin looping, not only between the flanking CTCF sites but also between the LCR enhancer region and the downstream β-globin target genes . Conditional deletion of cohesin in thymocytes was shown to disrupt the formation of regulatory chromatin loops in the T-cell receptor-α locus, with reduced transcription and impaired V(DJ) rearrangement as a consequence . Pairwise comparison between two cell types revealed that it is mostly the CTCF-independent cohesin-binding events that show cell-type specificity. At these sites, cohesin is often found co-localized with mediator and RNA polymerase II (RNAPII), indicating a CTCF-independent function at enhancer sequences. Consistent with this, these genomic sites were often found to be close to actively transcribed genes , and to be co-occupied by tissue-specific transcription factors [49,50]. Collectively, this shows that CTCF and cohesin have shared and independent functions at regulatory sequences in the genome. Cohesin can form chromatin loops during interphase. However, whether this occurs through its embracement of two DNA double helices still awaits formal proof.
5. CTCF and other binding partners
CTCF performs multiple roles, and in agreement the protein shares chromatin binding sites with many other factors [51–53]. Co-association events such as those with the histone deacetylase SIN3 , the thyroid hormone receptor , nucleophosmin , Kaiso  and the DEAD-box RNA helicase p68 with associated non-coding RNA  have been implicated in its insulator function. Interestingly, the p68 RNA–protein complex appears required for positioning cohesin at the CTCF sites of the H19–Igf2 ICR . In addition, CTCF co-occupies sites with the transcription factors FOXA1 and the oestrogen receptor (ER). These sites tend to locate near ER-responsive genes, suggesting that CTCF facilitates their transcriptional activation . Furthermore, CTCF recruits the basal transcription factor TAF3 to intergenic sites in embryonic stem cells (ESCs), where TAF3-dependent chromatin loop formation was shown to activate gene transcription . In a study that monitored RNAPII tracking along long tumour necrosis factor-alpha responsive genes, pausing of RNAPII was observed at CTCF- and cohesin-bound sites . This pausing can serve to incorporate weak exons and, therefore, facilitate alternative splicing . Thus, intra- and intergenic CTCF sites can have many different roles.
6. CTCF function at individual gene loci
Given its diverse activities, it seems necessary to zoom in on individual loci to understand CTCF's local function. At the proto-oncogene Myb locus, CTCF binding occurs in the first intron of the gene, where it inhibits RNAPII elongation. Transcriptional pausing by CTCF can be overcome by upstream enhancers that bind tissue-specific transcriptional activators and loop towards the Myb promoter . At the major histocompatibility complex class II (MHCII) locus, CTCF and cohesin binding to, and looping between, upstream sequences precedes transcriptional activation. Upon binding of the MHCII trans-activator CIITA to the promoter sequences, loops are induced between them and the various CTCF sites, resulting in increased expression of MHCII genes [64,65].
CTCF also binds to many sites across the immunoglobulin and T-cell receptor antigen receptor gene loci. In conditional CTCF knockout mice, V gene usage in the Igκ light chain locus was found to be altered, with increased recombination with proximal and reduced recombination with distal V segments. This was accompanied by corresponding changes in germline transcription at these locations, suggesting that CTCF, such as cohesin , mediates gene usage of the antigen receptor loci via the local regulation of germline transcription . In this model, germline transcription increases accessibility of the region, which facilitates their selection for recombination . CTCF depletion does not always result in aberrant gene expression. Using the same conditional knockout mice  Hoxd gene expression in the developing limb bud was unaltered after knockout of CTCF . Hoxd gene expression in the limb bud is under the control of many distant regulatory sequences that physically loop towards the genes . Unaltered Hoxd expression in the absence of CTCF suggests that these enhancer–promoter loops are not influenced by CTCF binding to sites in and around the locus. This raises the question whether CTCF has any impact on the three-dimensional topology of the locus. While Hoxd expression was not affected, CTCF depletion did cause massive cell death in the limb, showing that CTCF is critical for the transcriptional regulation of other genes involved in cellular homeostasis .
A final locus that is interesting to discuss is the protocadherin-α cluster. This cluster encodes neuronal-specific transmembrane proteins that are mono-allelically expressed and thought to be involved in the recognition and diversification of neurons. Expression of the cluster is under control of a downstream enhancer that influences the expression of the 12 isoforms, each of which is having its own alternative promoter . Interestingly, the enhancer as well as each individual promoter has a binding site for CTCF. Expression of the isoforms is reduced upon conditional CTCF knockout in post-mitotic neurons. This suggests that long-range interactions are part of the regulatory process that controls transcription of these genes [70–72].
Collectively, these studies support the idea that chromatin-bound CTCF can attract many different transcription factors in a tissue- and genomic context-specific manner. Its exact function at a given genomic site is probably determined by these associated transcription factors, by the location of this site relative to the TSS of a gene, and by its engagement in chromatin loops with other CTCF-binding sites, enhancers or gene promoters.
7. Genome-wide chromatin loops mediated by CTCF
A computational intersection of the genomic-binding sites of CTCF (assessed by genome-wide ChIP) with a genome-wide DNA contact map generated by Hi-C  suggested that CTCF is involved in chromatin interactions between and within chromosomes across the genome . Chromatin interaction analysis with paired-end tag sequencing (ChIA-PET) combines ChIP with a 3C approach and was developed to study genome-wide DNA interactions mediated by a protein of interest . When targeted to CTCF, ChIA-PET uncovered roughly 1500 intra-chromosomal and around 300 inter-chromosomal interactions mediated by this protein . Subsequent clustering of the regions (10–200 kb) encompassed by the intra-chromosomal loops was done based on the distribution of histone marks. This showed that CTCF loops can contain active chromatin separated from inactive chromatin outside the loops and vice versa. CTCF can also capture enhancers and promoters together in a chromatin loop . Only a fraction of the roughly approximately 40 000 CTCF binding sites was found to participate in the roughly 1500 CTCF-mediated loops. This implies that either not all interactions mediated by CTCF have been identified or that most CTCF sites are not engaged in the formation of loops.
The latter may well be true, because 5C (chromosome conformation capture carbon copy) technology showed that most CTCF sites across 1 per cent of the genome do not participate in chromatin loops, no matter whether they are co-occupied by cohesin or not. CTCF-bound sequences were often skipped by gene promoters making contacts with enhancers or with other CTCF sites even further away .
The recent availability of large genome-wide DNA interaction datasets [15,73] facilitates the assessment of CTCF's impact on chromosome topology. Sequences close on the chromosome to CTCF binding sites were shown to be biased in their DNA contacts: they interacted with other sequences on the same side of the CTCF site more than with sequences across this site (figure 3a; ). The same was previously shown for a different insulator protein in Drosophila: its binding to a site prevented flanking sequences to physically contact each other across this site . Interestingly, this may provide an explanation for how insulators function: they can prevent spatial DNA contacts across the insulating sequence. In a particularly detailed genome-wide DNA contact study topological domains were defined; they are chromosomal regions of on average 1 Mb in size, within which sequences preferentially interact with each other [15,79]. A strong conservation of topological domains was seen between tissues and even between species, suggesting that these domains do not contribute themselves to the specific identity of cells. Interestingly, CTCF binding sites were enriched in 20 kb windows surrounding the boundaries of these domains (figure 3b), re-emphasizing its role as a chromatin organizer. In one case it was shown that disruption of a boundary led to intermingling of topological domains and caused misregulated expression of the genes involved . Unlike the topological domains themselves, contacts within the domains do change during differentiation. Here to CTCF appears to play a role, probably to accomodate developmental changes in gene expression [80,81].
8. Concluding remarks
Despite being the subject of intense research, CTCF manages to remain a mysterious transcription factor. It binds to many thousands of sites across the genome, where it can interact with a plethora of other transcription factors. It is often found engaged in chromatin loops, sometimes with and sometimes without the involvement of cohesin. It can form chromatin loops with other CTCF binding sites, but also with enhancer and promoter sequences. CTCF binds to sequences outside and away from genes, but also inside the gene body, where it appears capable of pausing the sliding polymerase molecule. Finally, CTCF binding sites still actively jump around as retrotransposable sequences, giving diversity to the CTCF binding landscape between different mammalian species.
We believe that the unifying theme that may explain the many, and sometimes opposing, functional consequences of CTCF association to chromatin is probably its ability to form chromatin loops. Depending on the sequences encompassed in the loops and those excluded from the loops, chromatin shaped by CTCF may facilitate or hamper three dimensional contacts between enhancers and target genes, with different outcomes for transcription. Many questions still remain though: why do some CTCF sites form a chromatin loop and others not? To what extent does this rely on co-associated protein factors? How does the protein manage to interact with so many other transcription factors when bound to chromatin? One possibility is that CTCF serves as a roadblock for chromatin-scanning transcription factors that somehow get trapped when encountering the bound protein. What is the relevance of CTCF-mediated interchromosomal contacts? Does CTCF block enhancer–promoter communication by preventing 3D DNA contacts? Or does insulation involve the physical interaction of the insulator sequence with both enhancers and promoters? Answers to these questions are needed to enable predicting whether a given CTCF binding event will be functionally irrelevant, will cause transcriptional activation or repression, will interfere with transcriptional activation or will create a chromatin boundary.
We would like to thank Patrick Wijchers and Peter Krijger for useful comments. This work was financially supported by grant no. 935170621 from the Dutch Scientific Organization (NWO) and a European Research Council Starting Grant (209700, ‘4C’) to W.d.L.
One contribution of 12 to a Discussion Meeting Issue ‘Regulation from a distance: long-range regulation of gene expression’.
- © 2013 The Author(s) Published by the Royal Society. All rights reserved.