We have combined the circular chromosome conformation capture protocol with high-throughput, genome-wide sequence analysis to characterize the cis-acting regulatory network at a single locus. In contrast to methods which identify large interacting regions (10–1000 kb), the 4C approach provides a comprehensive, high-resolution analysis of a specific locus with the aim of defining, in detail, the cis-regulatory elements controlling a single gene or gene cluster. Using the human α-globin locus as a model, we detected all known local and long-range interactions with this gene cluster. In addition, we identified two interactions with genes located 300 kb (NME4) and 625 kb (FAM173a) from the α-globin cluster.
The past few years have seen the development of techniques that allow us to detect physical interactions between key chromosomal elements that regulate gene expression. Such analyses are based on chromosome conformation capture (3C), designed to identify and quantitate novel ligation products between any two specific DNA sequences (in chromatin) that become closely juxtaposed in the nucleus in vivo [1–3]. By quantifying ligation products in large populations of cells, the relative frequency of such physical interactions can be inferred. Using 3C, it has been shown that cis-acting elements located up to 1 Mb apart on a chromosome may interact (forming a chromosomal loop) when genes are switched on  or off [5,6].
More recently, related techniques have been developed to analyse many intra- and inter-chromosomal interactions simultaneously, in an unbiased manner rather than focusing on pre-selected pairs of sequences [7–12]. Such methods ([8,9,11–15]; referred to as circularized 3C, or circular chromosome conformation capture (4C)) are based on the assumption that, as a result of cross-linking, cutting and ligating regions of interacting chromatin, circular molecules containing various combinations of associated segments of DNA will be formed (figure 1). Using inverse PCR primers located near the ends of a chosen fragment of interest (the ‘bait’), the amplified 4C library should capture all sequences (within a population of cells) that interact with the bait. An important aim of this approach is to characterize in detail the entire cis-regulatory network controlling a single gene.
In the work presented here, we have used a slightly modified 4C protocol with sufficient resolution to identify individual elements interacting with a chosen cis-element, and used this to analyse the human α-globin locus[16,17] that has previously been analysed using conventional 3C (from multiple fixed positions) in both erythroid (expressing) and non-erythroid (silent) cells [18,19]. All previously known interactions with the α-globin regulatory elements and interactions with two additional non-globin genes (NME4 and FAM173a) were identified.
(a) Generating a library of chromatin interactions using a circular chromosome conformation capture protocol
The methodology is based on the established 3C assay (fig. 1 provided in reference Gondor et al.  and in §4). In brief, living cells were treated with formaldehyde to cross-link protein and DNA [21,22]. Cells were then lysed to allow restriction enzyme digestion of the cross-linked chromatin. The bulk of chromatin was cut into small fragments, while cross-linked, interacting protein/DNA complexes remained physically associated with each other (figure 1). These non-random interactions were covalently linked by ligation of the DNA fragments (while still in the context of chromatin) in a large volume to maximize the difference between infrequent ligation of random individual fragments and the more frequent ligation of closely associated fragments that interact in chromatin. Finally, cross-links were reversed and DNA was isolated.
In the standard 3C protocol, the enrichment for an interaction between two specific fragments (relative to the random background level of interactions) is assayed using quantitative PCR with a primer in each fragment (see primers A and C in figure 1 and [2,3]). However, interacting molecules can be circularized (figure 1). By designing appropriate PCR primers for the chosen bait, a library containing all of the fragments that were interacting at the time of the initial cross-linking can be made by inverse PCR (primers A and B in figure 1; ).
A frequently cutting restriction enzyme (DpnII, ^GATC) that efficiently digests cross-linked chromatin was used to cut the cross-linked chromatin. Its average fragment size is 433 bp, improving resolution compared with using 6-cutter enzymes [13–15]. Circles containing relatively short fragments can be efficiently amplified in subsequent PCR reactions (see below). Furthermore, the large 4 bp over-hang (5′-3′) remaining after DpnII digestion provides a very efficient substrate for ligation.
A second ligation step that facilitates the probability of re-circularization after isolation of DNA was found to improve the detection of previously characterized cis-interactions involving the α-globin promoters (see the electronic supplementary material, figure S1).
(b) Analysis of an amplified circular chromosome conformation capture library on tiled genomic microarrays using a promoter as the bait
Using this protocol, 4C libraries were generated from primary erythroid cells (eight replicates), primary T cells (two replicates) and Epstein–Barr virus (EBV)-transformed B lymphocytes (four replicates). These libraries were amplified using inverse PCR primers designed to capture all sequences interacting with the chosen bait (figure 1).
Primers for the α-globin gene promoters were designed within an 899 bp DpnII fragment (chr16:162 450–163 349 and chr16:166 254–167 153) containing the promoters of the duplicated HBA genes (α1 and α2, figure 2a). Labelled DNA was hybridized to tiled microarrays representing the human α-globin genes, their known regulatory elements and 500 kb of flanking DNA . Enrichment was measured relative to labelled, sonicated input DNA. In both non-erythroid cell types (T-lymphocyte and EBV-transformed B lymphocyte), positively enriched signals on the microarray were only found in regions over and immediately adjacent to the bait (figure 2b). No additional distant, long-range interactions were seen. As a positive control, we analysed the non-erythroid 4C libraries using inverse PCR primers within a previously characterized constitutive CTCF-bound region in the same genomic interval. As for other CTCF-bound regions [7,24], this bait was shown to participate in looping events with other CTCF-bound regions (see the electronic supplementary material, figure S2).
In erythroid cells, the α-globin promoters can be seen to interact with their flanking sequences to a much greater degree with new, specific and strong interactions (figure 2b). The most proximal strong interaction is with the promoter of a neighbouring gene HBM (µ globin), a globin-related gene expressed only in erythroid cells . The lack of such a signal in the non-erythroid tissues demonstrates that this is not a random interaction and is not due to cross-hybridization with the amplified HBA promoter. This comparison between chromatin derived from expressing and non-expressing cells shows that when both the HBM and HBA genes are active, their promoters frequently interact, which has been reported for other active promoters [26–28].
The main distal interaction occurs over the body of the C16orf35 gene containing the α-globin regulatory elements (multiple conserved sequence (MCS)-R1 to R3). In erythroid samples, the signal is maximal over this area with distinct peaks of interaction seen over each element clearly identifying MCS-R1 (hypersensitive site, HS48); MCS-R2 (HS-40) and MCS-R3 (HS-33). A smaller peak can also be seen over a fourth conserved erythroid cis-element, MCS-R4 (HS-10), located between the C16orf35 gene and the HBA genes. Therefore, using this 4C protocol with the α-globin promoter as bait, we identified all known, non-random, tissue-specific interactions between the promoters and their upstream regulatory elements previously established by 3C analysis .
(c) Analysis of an amplified circular chromosome conformation capture library on tiled genomic microarrays using a distal regulatory element as the bait
To further validate these interactions, we reversed the direction of the capture by designing primers in a region (identified above) that interacts with the HBA genes. We, therefore, amplified the same 4C libraries with inverse primers designed within a 1205 bp DpnII fragment containing the MCS-R2 cis-element (chr16:102 658–103 864) and analysed this material on tiled microarrays. Figure 2c compares two erythroid and two non-erythroid (EBV-transformed B lymphocyte) samples. As with the promoter capture, there is little evidence of any long-range interactions in the B-lymphocyte cultures, in which the MCS-R2 element and HBA genes are inactive. By contrast in erythroid cells, strong peaks of interaction are seen over two neighbouring cis-elements, MCS-R1 and MCS-R3, with a smaller peak over the more distal MCR-R4 element. Signal is also seen over the genomic area containing the HBA genes, and a strong peak is seen over the promoter of HBM.
The overlap and reciprocity of the interactions identified by 4C using the gene promoters as bait compared with those identified using MCS-R2 as bait, supports the previously proposed model in which these two sequences interact specifically in erythroid cells . Furthermore, not only do the promoters of the HBA genes interact with all four characterized cis-elements (MCS-R1-4), but when using just one of these elements (MCS-R2) as bait, we identify the α-globin promoters and also capture the other regulatory elements (MCR-R1, 3 and 4) strongly suggesting that all of these elements come together (possibly simultaneously) forming an active chromatin hub, as previously suggested for the β-globin gene cluster .
(d) Analysis of circular chromosome conformation capture libraries by high-throughput sequencing using paired-end reads
High-throughput sequencing (HTS) has revolutionized genome-wide analysis, and in 4C experiments has an advantage over microarray experiments by confirming the specificity of amplification and allowing the exclusion of mis-primed amplimers from downstream analysis (see below). A feature of the Illumina platform is the ability to sequence both ends of a DNA fragment and link the forward and reverse reads. This paired-end HTS (peHTS) protocol produces 50 bp of sequence from either end of a single DNA fragment.
Initially in our analysis, each end of the fragment was considered as a single read and mapped to a specific DpnII fragment. At this stage, the relationship between the two paired-ends was masked from the standard mapping tools. Once the individual ends had been mapped, the paired-end information (linking one single copy read to another) was then used to unequivocally score interactions with one end in the α-promoter and the other in the ligated, interacting fragment (see figure 3 and electronic supplementary material, figure S3a). Mis-primed fragments that lacked the expected sequence associated with the bait fragment could be excluded. Fragments that lie adjacent to each other in the genome (possibly resulting from non-digestion of chromatin or ligation simply based on their proximity) were also removed from the analysis.
Using this approach, we analysed two 4C libraries (independently derived from the primary erythroid cells of two individuals). These libraries were amplified with inverse primers (using the α-globin promoter as the bait), sonicated and prepared for paired-end sequencing. We defined an interaction as any read pair of which one end mapped to the bait and another mapped to a distal fragment in-cis or in-trans. By doing the experiment twice, we could substantially exclude random interactions or infrequent non-random interactions. The first dataset identified 338 consistently interacting regions of which 50 per cent were on chromosome 16, with other interactions distributed across the remainder of the genome. In the second dataset, there were 4172 interactions (although many of these were unique interactions recorded in a single paired-end read) of which 9 per cent mapped to chromosome 16. Although many of the interactions on chromosome 16 were consistently detected in both experiments, interactions with other chromosomes most frequently differed between experiments suggesting that they may be random.
To investigate this further, we generated two datasets (defining sequences and positions) of fragments that interacted with the α-globin promoters in both experiments. From these two datasets, we identified fragments common to both sets of data. Initially, we analysed the genomic distribution of these fragments irrespective of their number of interactions with the α-globin promoter (figure 4). These data suggest that most regions interacting with the α-globin promoters occur on chromosome 16 (presumably in cis), and that (using the α-globin promoter as bait) stable trans-interactions with this fragment are rare. We next analysed the number of sequences mapping to each of 122 interacting regions throughout the genome (identified in both experiments); in general, this should reflect the frequency of each interaction. We first analysed the number of interactions with each chromosome (regardless of the number of interacting regions on each chromosome). We excluded strong local interactions (from 157 000 to 170 000 bp on chr16; figure 4c) around the α-globin promoters to leave only those interactions representing true distal looping events. This analysis showed that 96.4 per cent of all interactions occurred on chromosome 16, although this approach does not exclude infrequent non-random interactions.
The signal on chromosome 16 was further dissected, showing most of these interactions occur in the terminal 0.5 Mb of the short arm of the chromosome (figure 4c). This region contains the α-globin genes, their regulatory elements MCS-Rs (MCS-R1, MCS-R2, MCS-R3 and MCS-R4) and at least 12 associated prominent CTCF-bound regions. We also found two long-range interactions with genes located 300 kb (NME4) and 625 kb (FAM173A) from the α-globin cluster. We have recently shown that the closest of these two genes (NME4) is upregulated in erythroid cells and this is under the control of MCS-R2 . FAM173a is not expressed in erythroid cells, and, therefore, its interaction with α-globin may represent a structural interaction.
Almost half of the consistently mapped α-globin promoter interactions are found in 0.0005 per cent of the genome containing its regulatory elements. Ten per cent of interactions occur with the major regulatory element MCS-R2. Clearly these are frequent (1859 of 18 607 mapped interactions), reproducible (present in both biological replicates) non-random interactions.
The 4C approach allows analysis of chromosome conformation, in an unbiased way, without prior knowledge, to identify all sequences (in cis and trans) that interact with a genomic element of interest, and, therefore, is of considerable value in identifying the comprehensive network of regulatory elements controlling individual genes [13–15]. This addresses a common and timely problem in genome annotation, and solves the issue of assigning the functional effects of a particular sequence or structural variant (e.g. identified in genome-wide association studies analyses) to a specific gene .
By using a frequently cutting restriction enzyme (DpnII), tiled microarrays and HTS, cis-acting sequences were localized at a high resolution, and sequencing unequivocally identified the underlying cis-acting sequences. For the human α-globin cluster, these corresponded to previously identified regulatory elements. The use of HTS rather than microarrays considerably increased the specificity of the assay. Importantly, using paired-end reads it was possible to exclude mis-primed and non-digested sequences that do not represent true interactions with the cis-element being used as the bait. A criticism of this technique could be that the extensive rounds of amplification required and the heterogeneous sizes of the circles may blunt the dynamic range; however, the ability of the 4C technique to reproduce the known, predominant interactions in erythroid and non-erythroid cells is notable.
In addition to validating the approach used, analysis of the human α-globin cluster also revealed additional information about the interaction between the upstream regulatory elements and the α-globin promoter. It was previously known from 3C analysis that four erythroid elements (MCS-R1 to R4) interact with the α-globin promoter in erythroid cells. Furthermore, from the same 3C data, we had previously implicated CTCF/cohesin-bound regions in the establishment and/or maintenance of such loops consistent with the recent studies of others [7,24,32–36]. Preliminary experiments using direct sequence analysis of paired-end reads also identified re-circularized molecules that not only contained two or more MCS-R elements, but a number of captured sequences containing MCS-R2 (the major α-globin regulatory element) also contained the CTCF/cohesin element associated with HS-46, specifically in erythroid cells (see the electronic supplementary material, figures S3b and S4). As these sequences were found together on the same ligated molecules, this may reflect how the 4C protocol could reveal molecular interactions originating from a single locus. As this CTCF/cohesin-bound element lies between the MCS-R1 enhancer element and the α-globin promoters and appears to interact simultaneously with both of them, it appears that this CTCF-bound element does not act as either an enhancer blocker or a boundary element in this instance. Provisional analysis of these sequences showed that different interactions may be present in different individual cells suggesting that the interaction between the MCS-R elements and the α-globin promoter is dynamic. Further, deep sequencing data will be required to pursue this preliminary observation.
Other chromosome conformation studies have suggested that there may be a wide network of interactions between cis-acting elements throughout the genome and that specific trans-interactions (e.g. between α- and β-globin) occur frequently . Others have suggested that trans-interactions are transient and infrequent [8,38]. Using the 4C protocol described here, we did not observe frequent, trans-interactions between the α- and β-globin loci either analysing the experiments on microarrays (see the electronic supplementary material, figure S5) or by sequencing. Within the limits of these experiments, if interactions (determined by counting the number of interacting fragments obtained by paired-end sequencing) between the α- and β-globin promoters do occur, they are at least 1000 times less frequent than the functionally relevant cis-interactions (e.g. between the globin promoters and their upstream regulatory elements).
Consistent with the 4C data presented here, Hi-C, a relatively low resolution (1 Mb) approach for full genome analysis  proposed that cis-elements most frequently interact with sequences located on the same chromosome in cis; trans-interactions appeared to be rare. Here, we looked for consistent very long-range (greater than 1 Mb) interactions (in cis or in-trans) by comparing all interactions with the α-globin locus throughout the entire genome. It was shown that nearly all reproducible frequent, non-random interactions with the α-globin promoter are restricted to the terminal megabase of chromosome 16 (where the α-genes are located), and more than 74 per cent of these occur within the previously defined 170 kb α-globin domain (chr16:1–170 000).
Genes located on a particular chromosome often lie within a territory occupied by other sequences on the same chromosome (chromosome territory). In the Hi-C analysis, it was suggested that the high frequency of interactions across the chromosome (in cis) could best be explained by interactions occurring within the context of a specific chromosome territory . In contrast to many of the sequences analysed by Hi-C, there were relatively few consistent interactions between the α-genes and other genes located on chromosome 16 either in this study or in the published Hi-C analysis. One explanation could be that unlike many genes in the interstitial segments of chromosomes, the terminal 2 Mb region containing the α-globin locus consistently extends beyond the chromosome 16 territory in both erythroid and non-erythroid cells , and, therefore, may be more mobile and interact albeit inconsistently with a wide range of sequences below the detection of current analysis.
4. Material and methods
(a) Cell types
EBV-transformed lymphoblastoid (EBV B lymphocyte) cell lines were cultured in RPMI 1640 supplemented with 10% (v/v) fetal calf serum, 2 mM l-glutamine, 50 U ml−1 penicillin and 50 µg ml−1 streptomycin. Isolation and culture of primary human erythroblasts was carried out as described previously . T lymphocytes were cultured as follows. Whole blood was mixed with an equal volume of RPMI and subjected to Ficoll-Paque separation (GE Heathcare Life Sciences). The interphase mononuclear layer was removed and diluted in RPMI + 10% fetal calf serum. Cells were pelleted at 700×g for 5 min and wash was repeated. Cells were then resuspended in 10 ml lysis buffer (150 mM NH4Cl, 10 mM KHCO3, 0.1 mM EDTA) and incubated on ice for 1 min. Cells were centrifuged at 700×g for 5 min and washed twice in RPMI + 5% fetal calf serum. After final wash and centrifugation at 700×g for 5 min, cells were resuspended in 30 ml RPMI + 20% fetal calf serum supplemented with 1 mg ml−1 phytohaemagglutinin and 20 U ml−1 interleukin 2. Cells were incubated at 37°C with 5 per cent CO2 for 3–4 days.
(b) Circular chromosome conformation capture
The protocol used for 4C analysis was based on Zhao et al. [9,20] with a minor modification. Following phenol/chloroform extraction and ethanol precipitation of the 4C library, DNA was resuspended in 500 μl 1× ligation buffer and 60 U high concentration T4 DNA ligase (Fermentas) and incubated for 2 h at 16°C at 1200 r.p.m. (Eppendorf Thermomixer comfort). Following phenol/chloroform extraction and ethanol precipitation, DNA was resuspended in 100 μl water, of which 10 μl was used as template in Advantage-GC PCR (Clontech) as per manufacturer's instructions using 34 cycles of amplification (cycling conditions were 94°C for 2 min; 34 cycles of 94°C for 30 s, annealing temperature for 30 s, 68°C for 5 min followed by a single extension cycle at 68°C for 8 min). Primer sequences and annealing temperatures are presented in table 1. The resulting amplified DNA was ethanol precipitated and resuspended in 20 μl water, of which 5 μl was hybridized to a custom α-globin tiled microarray using sonicated genomic DNA as input, as previously described . Data from the microarray experiments are available from the GEO database under the accession no. GSE42384.
(c) Preparation of circular chromosome conformation capture material for Solexa/Illumina sequencing
DNA obtained from the 4C amplification was reduced to an average size of 500 bp by sonication, either by shearing for 10 min using a Sonic Dismembrator 550 (cup horn, Fisher Scientific, Canada) or using a Covaris S2 sonicator (KBiosciences, Europe) using a duration of 90 s, intensity of 3 and a duty cycle of 5. A 300–600 bp fraction was gel extracted from an 8 per cent PAGE gel and prepared for paired-end sequencing using the manufacturer's recommended protocol as briefly set out here. Sonicated DNA from a 4C amplification was prepared for sequencing by end repair, A-tailing and ligation of adapters (Illumina) as outlined in the manufacturer's recommendations. Following adapter ligation, some minor modifications were made to the protocol: DNA was amplified with 18 cycles of PCR before size selection of 200–350 bp fragments from a gel. The excised library was purified using the QIAquick gel extraction kit (Qiagen) before quality checking on an Agilent bioanalyzer. DNA was quantified using the Quant-iTT dsDNA HS assay kit (Invitrogen) before dilution to 10 nM. Paired-end reads of 51 bp were generated on the Illumina GAII platform. A single lane of paired-end sequence was sufficient to produce a robust signal (table 2).
(d) Mapping of Solexa/Illumina reads
Paired-end sequences were mapped as single-end reads to overcome the built in assumptions of the relative positioning of paired-end sequences in the sequence aligning programs. The sequences were mapped to an in silico DpnII digested and repeat-masked version of HG18 (UCSC Genome browser, NCBI build 36.1). This digested version of HG18 was generated by mapping the position of all DpnII sites in the unmasked HG18 sequence. This in silico digestion had to be performed on an unmasked version of the HG18 as repeat masking would also have masked some DpnII restriction sites. These positions were then used to digest a repeat-masked version of the HG18 genome to generate a multiple Fasta file of all repeat-masked DpnII fragments in HG18. Repeat masking was performed using the latest ‘all mapped repeat’ data for HG18 (UCSC tablebrowser HG18_Human_allrmsk.txt). The sequence data were mapped to the digested genome using three different aligners, MAQ (v. 0.7.1; ), Exonerate (v. 2.2.1; ) and Novoalign (v. 2.05.13; www.novocraft.com) to exclude any aligner-specific effects, using the default stringencies for each program. The alignments were post processed, using in-house Perl scripts, for uniqueness of alignment, with reads mapping to the α-globin genes being allowed to map to three locations of chromosome 16 owing to the duplicated nature of the genes; all other reads had to align uniquely to the genome.
(e) Junction analysis
(i) Solexa/Illumina paired-end reads
The paired-end reads represent a 50 bp sequence from each end of a sonicated DNA fragment of the inverse PCR products. As sonication is mostly a random process then the paired-end reads randomly sample the sequence, at either end of approximately 500 bp intervals, across the inverse PCR products. This random sampling can be used to detect the ligation events, which underlie the 4C signal, where one end of the fragment maps to one DpnII fragment and the other end of the fragment maps to another. Although, as described above, the paired-end reads were initially mapped as single reads, the physical relationship between the paired-end reads was maintained in the naming structure of each read. Using in-house Perl scripts, this relationship could be used to call an interaction between these two fragments and ultimately count the total number of mapped interactions between all DpnII fragments in the HG18 genome. This dataset ignored all pairs that map to the same fragment (does not span a junction) and was further refined to remove interactions between fragments adjacent in the genome owing to local interactions or non-digestion. This dataset was further constrained so that each junction had to have a read in the capture fragment and hence represent a valid ligation event rather than non-specific amplification. Data from the sequencing experiments are available from the GEO database under the accession no. GSE42384.
(ii) Overlap analysis
Both biological 4C paired-end sequenced replicates (named JH1 and JH2) were analysed as described above. To identify the consistent signal between the two biological datasets, an intersection of the two results was performed such that interaction fragments that existed in both datasets were kept and the number of interactions of that fragment with the α-globin promoter were averaged. This analysis was made less stringent by the inclusion of fragments, which although they did not have exactly the same genomic coordinates, their coordinates fell within a 1 kb window of genomic distance of a peak in the other replicate. In this case, the signal for each peak was not averaged with the other, rather each was included in the final dataset with their respective coordinates and number of interactions with the α-globin promoter.
This intersection dataset was then binned into a set of genomic regions representing, each human chromosome, the chromosome 16p terminal 1 Mb (chr16:1–1 000 000), the encode region Enm008 (chr16:1–500 000), the region from the 16p telomere to the end of the globin θ gene (chr16:1–170 000), the area covering MCS-R1 to MCR-R3 (chr16:93 900–111 261) and region 1 kb either side of the MCS-R2 element (chr16:102 493–104 848).
This work was supported by the Medical Research Council (UK) and National Institute for Health Research (NIHR) Biomedical Research Center Program. J.R. is supported by the Wellcome trust. We thank Joyce Reittie for her technical assistance. We thanks David Garrick for his advice and comments on the manuscript; Nicki Gray for help preparing the manuscript; Cordelia Langford, Peter Ellis and the staff of the Wellcome Trust Sanger Institute Microarray Facility for array printing; Lorna Gregory, Yongjun Zhao and Steve Jones for sequencing support; and Zong-Pei Han of CBRG Oxford for computational and systems support. J.R.H., R.J.G. and D.R.H designed research. J.R.H., K.M.L., M.D.G., J.A.S-S and D.V. performed experiments. J.R.H., I.D. and S.T. analysed data. S.M. provided database support. J.R. provided resources. J.R.H., R.J.G. and D.R.H. wrote the manuscript. The authors declare no competing financial interest.
One contribution of 12 to a Discussion Meeting Issue ‘Regulation from a distance: long-range regulation of gene expression’.
- © 2013 The Author(s) Published by the Royal Society. All rights reserved.