In this special edition of Genomics, we present reviews of the current state of the field in identifying and functionally understanding transcriptional enhancers in cells and developing tissues. Enhancers are short (50-1000) bp DNA sequences that precisely regulate the expression of target genes. Along with promoters they likely constitute the vast majority of the regulatory sequences in metazoan genomes. They are essential for development and function in animals and plants, although the first enhancer to be identified was a viral enhancer from the SV40 genome (Banerji et al., 1981).
Typically several enhancers coordinate the expression of an individual target gene, each controlling that gene's expression in specific cell types at specific times. Until recently, identifying each gene's enhancers had been challenging because enhancers do not occupy prescribed locations relative to their target genes. Enhancers may be upstream, downstream, or within their target genes. They may be close to the transcriptional start site or as far as 1 million base pairs away. In the past, enhancers were painstakingly identified one-by-one. However, a number of powerful techniques have been recently developed that make it relatively straightforward to identify putative and actual enhancers on a genome-wide scale.
A central challenge in identifying enhancers in a given tissue or cell type is effectively isolating the relevant cells. Recently there have been powerful advances combining cell isolation and DNA sequencing that make it possible to identify the majority of enhancers in virtually any cell type. Bowman's review, “Discovering enhancers by mapping chromatin features in primary tissue,” discusses the wide range of techniques to isolate cells or nuclei from primary tissues. These techniques range from bench-top affinity purification to traditional fluorescence activated cell sorting (FACS). Of particular excitement are developments in bench-top methods, which make it possible for individual laboratories to conduct analyses of tissue specific enhancers directly, without needing to invest money or time at a FACS facility. These approaches are gaining traction, as exemplified by the widespread use of the “Isolation of Nuclei Tagged in specific Cell Types,” INTACT, method (Deal and Henikoff, 2011), which has now been used in Xenopus, Arabidopsis, Drosophila, and C. elegans.
Once cells are isolated, enhancers can be identified across the entire genome based on unique biochemical signatures. For example, enhancers tend to be largely nucleosome-free and therefore can be identified by methods that detect open chromatin, such as DNase-Seq, which detects DNase-accessible sites (Crawford et al., 2006), and ATAC-Seq, which detects transpose-accessible sites (Buenrostro et al., 2013). A more refined method is to identify loci bound by modified histones that are better correlated with enhancer activity: histone H3 monomethylated at lysine 4 (H3K4me1) (Heintzman et al., 2007), and histone H3 acetylated at lysine 27 (H3K27ac) (Creyghton et al., 2010).
An important question is which of these putative enhancers are really functional in a cell or tissue? Recently developed methods such as STARR-Seq and FIREWACh can identify functional enhancers at a genome-wide scale. Murdter et al. describe Self-Transcribing Active Regulatory Region sequencing (STARR-seq) in their article “STARR-seq - Principles and Applications.” STARR-Seq is a quantitative method that screens in parallel millions of DNA fragments for their enhancer activity. In STARR-Seq, DNA libraries of putative enhancers are cloned into the 3’ UTR of a reporter construct. In this way, each putative enhancer serves as a unique barcoded-reporter for itself. After transfection into cells, the activity of millions of putative enhancers can be measured in parallel using sequencing. DNA libraries of putative enhancers can be obtained from arbitrary sources of DNA, including genomic DNA, TF binding sites, or predicted enhancers. STARR-Seq has been performed in Drosophila cells, leading to the identification of thousands of functionally active enhancers (Arnold et al., 2013). Attempts to apply genome-wide STARR-Seq in mammals have been complicated by the larger size of the mammalian genome, which requires construction of much larger libraries and sequencing to much greater depths. As a result, newer methods that focus STARR-SEQ on selected portions of the mammalian genome, such as the CAPSTARR-seq (Vanhille et al., 2015), are likely to become more widely used in mammals.
In Dailey's article (“High Throughput Technologies for the Functional Discovery of Mammalian Enhancers: New Approaches for Understanding Transcriptional Regulatory Network Dynamics”) she describes Functional Identification of Regulatory Elements Within Accessible Chromatin (FIREWACh), a method in which DNase-accessible regions are cloned and subsequently screened for their activity. FIREWACh is more feasible in mammalian cells because it narrows the screen to to DNase-accessible regions. This eliminates the cloning, screening, and sequencing of extraneous sequences, dramatically simplifying the screening process. Moving forward, STARR-Seq, FIREWACh, and related methods are likely to uncover the dynamics of enhancers over the course of development and disease, such as in malignant transformation of mammalian cells.
Once enhancers have been found, it remains a challenge to discover the nucleotides within them that are required for their function. Systematic mutagenesis is impractical: even a single 100 bp sequence of DNA comprises far more possible sequence variants than can fit within the genomes of all the human beings on the planet. An emerging tool called Massively Parallel Reporter Assays (MPRAs) or CRE-seq, described both by Inoue and Ahituv (“Decoding enhancers using massively parallel reporter assays”) and by White (“Understanding how cis-regulatory function is encoded in DNA sequence using massively parallel reporter assays and designed sequences”), is enabling this sequence space to be explored. In MPRA experiments, designer libraries of oligonucleotides can be commercially obtained and tested for their enhancer activity using RNA-Seq. While the throughput of MPRA does not allow for brute force mutagenesis of multiple enhancers, it nonetheless enables searches for guiding principles (or “regulatory grammar”). These assays have demonstrated large-scale functional validation of putative cis-regulatory elements (CREs) , exhaustive mutational analyses of individual regulatory sequences, and tests of large libraries of synthetic CREs. Synthetic enhancers also allow for testing combinations of transcription factor binding sites that do not occur in the genome, thereby exploring non-natural “sequence space” to better understand the regulatory logic involved in cis-regulatory element function. Variants in enhancer regions affect phenotype and are implicated in human diseases (Ahituv et al., 2012). High-throughput reporter assays suffer from a few current limitations. While MPRAs enable the investigation of the function of programmable sequences, they are limited to sequences of up to several hundred nucleotides, which is likely shorter than many regulatory sequences.
Another critical question in understanding functional enhancers is what dictates enhancer-promoter interactions. Enhancer function was originally defined based on plasmid experiments as being distance- and orientation-independent (Banerji et al., 1981), but how does distance affect enhancer function across local genomic environments or at genome-wide scales? Here, Pindyurin et al. (“TRIP through the chromatin: a high throughput exploration of enhancer regulatory landscapes”) discuss a new direction in the field to study the cumulative effects of many enhancers within their genomic contexts. The approach they discuss, called “Thousands of Reporters Integrated in Parallel” (“TRIP”) involves integrating a barcoded reporter at random throughout the genome in tissue culture cells. In an early use of the technique, a constitutively active enhancer-promoter-reporter construct was integrated into thousands of locations, each of which was monitored by a unique barcode in the 3’UTR of the reporter (Akhtar, 2014). Interestingly, the expression of the reporter varied by as much as 3 orders of magnitude depending in large part on its distance to the nearest enhancer. This result upends the original idea that enhancer function is distance-independent. Instead, whether an enhancer has the potential to interact with a promoter depends almost entirely on its distance from that promoter. Enhancers in mammalian genomes that are within tens to hundreds of kilobases of promoters are able to interact, whereas those beyond these distances -- or outside of topologically associated domains (TADs) -- are not. This idea has been corroborated by genome-wide scans of chromatin interactions using Hi-C, which show that DNA-DNA interactions decline precipitously with linear genomic distance (van Berkum et al., 2010).
We can now identify functional enhancers in specific cells, but what about assaying enhancer function during organismal development? One powerful organismal system for understanding enhancer function as it relates to development and especially mammalian disease is the mouse. Reviews by Nord, (“Learning about mammalian gene regulation from functional enhancer assays in the mouse”) and Kvon (“Using transgenic reporter assays to functionally characterize enhancers in animals”) describe the original “enhancer trap” strategy to screen for regulatory sequences that drive expression in a cell- or tissue-specific manner. The VISTA database is a remarkable resource containing expression data from more than 2000 mouse and human sequences at embryonic day 11.5. The ability in mice to perform precise manipulations of endogenous genomic loci and examine resulting organismal phenotypes -- now being done with increasing frequency -- has confirmed that individual enhancers can influence limb development, facial morphology, brain structure, and body size.
In addition to the mammalian VISTA database, a database which is bound to have a major impact on our understanding of developmental enhancers is FlyEnhancers. As discussed in Kvon's review, FlyEnhancers currently reports the activity of over 7,700 2-kb DNA fragments cloned from the Drosophila genome and tested in transgenic reporter assays for activity in Drosophila embryos (Kvon et al., 2014). The fragments constitute 13.5% of the Drosophila noncoding non-repetitive genome, representing the most comprehensive examination to date of how enhancer activity is encoded across a genome. Over 45% of the tested fragments were found to act as enhancers in fly embryos, suggesting that the fly genome may have 50,000–100,000 developmentally active enhancers. Interestingly, 10-20% of the enhancers reported appear to skip over their nearest neighboring genes to regulate the expression of a more distal gene. Based on similar extrapolations, it has been estimated that the mouse genome may have on the order of 1 million developmentally active enhancers. As discussed by Kvon, the positive and negative data from FlyEnhancers will serve as both a guide and inspiration for continued approaches to functionally identify enhancers across animal genomes.
High-throughput functional assays of enhancers, whether highly laborious or highly parallel, have proven to be powerful in generating knowledge of enhancer function. These assays all treat the enhancer as a fundamental unit of regulation. Treating the enhancer as a unit is fully appropriate to study how transcription factors cooperate at an enhancer or to screen for genetic variants that alter enhancer function. Unfortunately, the throughput of functional assays of enhancers drops precipitously once the unit of regulation becomes the gene instead of the enhancer -- yet it is gene regulation that ultimately matters. There is great potential in extending high-throughput assays of enhancers to high-throughput assays of their function in regulating genes in the genome. Perhaps most promisingly, success in parallelizing Cas9-based genome editing to assess the effects of mutating many enhancers simultaneously could bring the power of massively parallel reporter assays to the genome.
Acknowledgments
We would like to thank all of the authors who contributed to this special edition of Genomics. JMG was supported by NIMH R01-MH101528. MM was supported by a UMass Life Sciences Moment Fund grant.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Ahituv N, editor. Gene Regulatory Sequences and Human Disease. Springer; New York: 2012. p. 283. [Google Scholar]
- Akhtar W, Pindyurin AV, de Jong J, Pagie L, Ten Hoeve J, Berns A, Wessels LF, van Steensel B, van Lohuizen M. Using TRIP for genome-wide position effect analysis in cultured cells. Nature Protocols. 2014;9(6):1255–81. doi: 10.1038/nprot.2014.072. doi: 10.1038/nprot.2014.072. Epub 2014 May 8. [DOI] [PubMed] [Google Scholar]
- Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science. 2013;339(6123):1074–7. doi: 10.1126/science.1232542. doi: 10.1126/science.1232542. [DOI] [PubMed] [Google Scholar]
- Banerji J, Rusconi S, Schaffner W. Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences. Cell. 1981:299–308. doi: 10.1016/0092-8674(81)90413-x. [DOI] [PubMed] [Google Scholar]
- Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature Methods. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. doi:10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crawford GE, et al. DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nature Methods. 2006;3:503–509. doi: 10.1038/NMETH888. doi:10.1038/nmeth888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Creyghton MP, et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A. 2010;107:21931–21936. doi: 10.1073/pnas.1016071107. doi:10.1073/pnas.1016071107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deal RB, Henikoff S. The INTACT method for cell type-specific gene expression and chromatin profiling in Arabidopsis thaliana. Nature Protocols. 2011:56–68. doi: 10.1038/nprot.2010.175. doi: 10.1038/nprot.2010.175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heintzman ND, et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet. 2007;39:311–318. doi: 10.1038/ng1966. [DOI] [PubMed] [Google Scholar]
- Kvon EZ, Kazmar T, Stampfel G, Yáñez-Cuna JO, Pagani M, Schernhuber K, Dickson BJ, Stark A. Genome-scale functional characterization of Drosophila developmental enhancers in vivo. Nature. 2014;512(7512):91–5. doi: 10.1038/nature13395. doi: 10.1038/nature13395. [DOI] [PubMed] [Google Scholar]
- van Berkum NL, et al. Hi-C: a method to study the three-dimensional architecture of genomes. Journal of visualized experiments : JoVE. 2010 doi: 10.3791/1869. doi:10.3791/1869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vanhille L, Griffon A, Maqbool MA, Zacarias-Cabeza J, Dao LT, Fernandez N, Ballester B, Andrau JC, Spicuglia S. High-throughput and quantitative assessment of enhancer activity in mammals by CapStarr-seq. Nat Commun. 2015;6:6905. doi: 10.1038/ncomms7905. doi: 10.1038/ncomms7905. [DOI] [PubMed] [Google Scholar]