Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Oct 2.
Published in final edited form as: Chromosoma. 2023 May 15;132(3):167–189. doi: 10.1007/s00412-023-00796-5

Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation

Holly Kleinschmidt 1,2, Cheng Xu 1,2, Lu Bai 1,2,3
PMCID: PMC10542970  NIHMSID: NIHMS1931647  PMID: 37184694

Abstract

Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.

Keywords: Synthetic DNA library, Chromatin accessibility, Gene regulation, MPRA, STARR-seq

Introduction

Recent progress in next-generation sequencing technologies has led to an explosion of genome-wide studies of protein binding, chromatin conformation, and gene expression. Advances in genetic techniques, like CRISPR/Cas9, allow researchers to generate targeted perturbations to DNA and RNA and measure the biological consequences. These genomic and genetic studies have greatly improved our understanding of mechanisms of gene regulation in healthy and diseased cells. Modulation of gene expression also represents a promising forefront in treatment of certain human diseases.

Despite enormous progress, it is still difficult to extract the genetic rules that determine factor binding, chromatin states, and gene expression level. In general, we have limited predictability of the localization of transcription factors (TFs), patterns of nucleosome positioning, and mRNA abundance. One reason for such deficiency is that all the above are complex, multiple-variable processes. For example, TF binding is affected by motif strength, DNA shape, co-factors, and chromatin context, and transcription level is affected by promoter and enhancer strengths, nucleosome occupancy, co-factor availability, 3D genome organization, etc. In genomic measurements, effects from all these variables mix together, making it difficult to evaluate the contribution from individual variables. The sequence space required to fully explore the combinations of these variables is extremely large, far exceeding the variations provided by the native genome, especially considering evolutionary constraints.

To solve the problem above, it is important to study these processes within a controlled variable space. Ideally, there should be an experimental system where variables can be selectively perturbed one at a time while all the other variables are kept constant. This can be achieved by introducing synthetic DNA libraries into the cells. By engineering artificially designed sequences into the same plasmid or chromatin background, these assays, in theory, allow us to measure a biological output (e.g. TF binding, chromatin accessibility, or transcription level) while changing one genetic rule at a time (e.g. number of TF motifs, motif strength, co-factor presence, ± disease-associated mutations). Similarly, the same sequences can be integrated into different chromosome loci, making it possible to evaluate the effect from chromatin context. Depending on the experimental design, 102 – 108 synthetic sequences can be interrogated simultaneously, making it an efficient tool for genotype–phenotype mapping.

In this paper, we review recent studies that introduce synthetic DNA libraries into cells or organisms to study TF binding, nucleosome positioning, and transcription level, summarized in Table 1. There are many other elegant methods using synthetic libraries to study similar processes in vitro. These topics are covered by excellent reviews, e.g. (Andrilenas et al. 2015; Nguyen et al. 2014; Orenstein and Shamir 2017; Zhuo et al. 2017), and will not be repeated here.

Table 1.

Summary of methods, parameters, and applications

Method Purpose DNA Source Variable DNA Length Library Sizes Tested Integrated or Extrchromosomal Quantification References

MPRA, MPFD, CRE-seq, Sharpr-MPRA, lentiMPRA, LV-MPRA, QeFS Enhancer activity Synthetic or Genomic PCR products < 200 bp (synthesized) or > 200 bp (PCA or PCR) Thousands Extrachromosomal or integrated RNA-seq on barcode Patwardhan et al. 2012, Melnikov et al. 2012, Kwasnieski et al. 2014, Kheradpour et al. 2013, Mogno et al. 2013, Smith et al. 2013, White et al. 2013, Kwasnieski et al. 2014, Ernst et al. 2016, Fiore and Cohen 2016, Nguyen et al. 2016, Shen et al. 2016, White et al. 2016, Grossman et al. 2017, Inoue et al. 2017, Maricque et al. 2016, Inoue et al. 2019, Waters et al. 2021, Kreimer et al. 2022, Martinez-Ara et al. 2022
STARR-seq, CapStarr-seq, ChIP-STARR-seq, ATAC-STARR-seq, WHG-STARR-seq Enhancer activity Synthetic DNA or Genomic DNA (whole-genome, BAC, captured, ChIP, or ATAC samples) ~ 0.5–1.5kb Millions Extrachromosomal RNA-seq on enhancer library Arnold et al. 2013, Arnold et al. 2014, Shlyueva et al. 2014b, Yáñez-Cuna et al. 2014, Zabidi et al. 2015, Vanhille et al. 2015, Dao et al. 2017, Liu et al. 2017a, Muerdter et al. 2018, Barakat et al. 2018, Klein et al. 2018b, Wang et al. 2018, Doni Jayavelu et al. 2020, Peng et al. 2020, Chen et al. 2022b, Neumayr et al. 2022, Sahu et al. 2022
SIF-seq Enhancer activity Genomic DNA fragments ~ 1–2 kb Thousands Integrated (single locus) Reporter protein fluorescence (FACS) Dickel et al. 2014
FIREWACh Enhancer activity Genomic NFR fragments ~ 150 bp Thousands Integrated (lentiviral) Reporter protein fluorescence (FACS) Murtha et al. 2014
CRISPR-Cas9 based (MERA, CREST-seq, ScanDel, Perturb-seq, CRISP-seq, CROP-seq, CRISPRpath) Endogenous enhancer activity Synthetic (gRNA) ~20–130 bp Thousands Extrachromosomal or integrated Reporter protein fluorescence (FACS); cell viability; RNA-seq on barcode; RNA-seq on whole transcriptome Canver et al. 2015, Diao et al. 2016, Jaitin et al. 2016, Korkmaz et al. 2016, Rajagopal et al. 2016, Sanjana et al. 2016, Datlinger et al. 2017, Diao et al. 2017, Ren et al. 2021
CRISPR-dCas9 based (CRISPRi, CRISPRa, CERES, Perturb-seq, Mosaic-seq, cris-prQTL, CRIS-PRi-FlowFISH, TAP-seq, direct-capture Perturb-seq, CRISPRpath) Endogenous ehancer activity Synthetic (gRNA) ~20–130 bp Millions Extrachromosomal Reporter protein fluorescence (FACS); cell viability;
RNA-seq on whole transcriptome
Gilbert et al. 2013, Adamson et al. 2016, Fulco et al. 2019, Klann et al. 2017, Fulco et al. 2019, Gasperini et al. 2019, Replogle et al. 2020, Schraivogel et al. 2020, Ren et al. 2021, Chen et al. 2022b
ReSE Silencer activity Genomic FAIRE fragments ~200 bp Thousands Integrated Cell viability assay Pang and Snyder 2020
MPRA, CRE-seq, SPECS, GPRA, FACS-seq, patchM-PRA Promoter activity Synthetic or Genomic PCR products < 200 bp (synthesized) or > 200 bp (PCR) Thousands or millions (GPRA) Extrachromosomal or integrated Reporter protein fluorescence (FACS) or RNA-seq on barcode Gertz et al. 2009, Kinney et al. 2010, Sharon et al., Mogno et al. 2013, Lubliner et al. 2015, de Boer et al. 2020, Mattioli et al. 2019, Weingarten-Gabbay et al. 2019, Wu et al. 2019, Kotopka and Smolke 2020, Renganaath et al. 2020, Yu et al. 2021, Hong and Cohen 2022, Martinez-Ara et al. 2022
TRIP Promoter activity at different loci Synthetic ~ 20 bp (barcode) Thousands Integrated RNA-seq on barcode Akhtar et al. 2013
STAP-seq Promoter activity Genomic DNA fragments ~200 bp Millions Extrachromosomal RNA-seq on tagged transcripts Arnold et al. 2017
SuRE Promoter activity Genomic DNA fragments ~0.2–2 kb Millions Extrachromosomal RNA-seq on barcode van Arensbergen et al. 2017
MNase-ISO, Methytransferase-ISO Nucleosome occupancy and positioning Synthetic 60–100 bp Hundreds to tens of thousands Integrated Chromatin accessibility Levo et al. 2017, Yan et al. 2018, Hammelman et al. 2020, Chen et al. 2022a

Using Synthetic DNA Libraries to Investigate TF Binding

Sequence-specific TFs are essential for regulation of gene expression, as they determine the time, location, and strength of transcription by binding to cis-regulatory elements (CREs) in a subset of cell types. Characterizing TF binding preferences is thus important to understand their regulatory roles. Binding motifs, which are consensus DNA sequences recognized by TFs, are commonly used to describe their binding preferences. However, only a fraction of consensus motifs in the genome are bound by the corresponding TF (Wang et al. 2012), and vice versa, not all genomic TF binding loci contain strong motifs (Kribelbauer et al. 2019). Although multiple mechanisms have been proposed to explain such discrepancies, including the sequence and shape of DNA flanking the core motif, cooperative binding between TFs, chromatin accessibility, DNA methylation, histone modifications, and subnuclear compartmentalization (Inukai et al. 2017; Isbel et al. 2022; Kribelbauer et al. 2019; Slattery et al. 2014; Srivastava and Mahony 2020), our understanding of this issue is far from complete.

As mentioned above, synthetic DNA libraries provide unique advantages in probing multivariable genetic processes. Several studies using this method have been conducted in budding yeast or mammalian cells to investigate the mechanisms that determine TF binding preference. The general experimental procedure is described in Fig. 1, where DNA oligonucleotide pools containing WT or mutant TF motifs are first synthesized and cloned into a plasmid library (Fig. 1AB). The plasmid library is then delivered into cells, either transiently or integrated into specific loci (Fig. 1C). TF binding to the library sequences is then measured by chromatin immunoprecipitation (ChIP) (Grossman et al. 2017; Zeigler and Cohen 2014), retrotransposon insertion (or ‘calling cards’) (Liu et al. 2020), or DNA methylation (Levo et al. 2017; Szczesnik et al. 2020), followed by high-throughput sequencing (Fig. 1DE).

Fig. 1.

Fig. 1

Overview of Methods Employing DNA Libraries to Study Gene Regulation. A) Oligonucleotide library design. DNA oligonucleotides included in the library can either be chemically synthesized as a microarray or derived from genomic fragments, followed by PCR amplification. B) Construction of the plasmid library. The oligonucleotide library in (A) can be cloned into a plasmid containing a reporter gene in variable locations, including upstream, 3’ UTR, or 5’UTR. C) Delivery methods of the plasmid library into cells. The plasmid library can be delivered to the cells via transformation, transfection, or transduction, depending on the host organism and final library destination. The library can either be transiently transfected as an episomal library or integrated into one or more loci via adeno-associated, lentiviral, transposition, or site-specific methods. This approach is amenable to both in vitro and in vivo systems. D) Library data collection. The effect of DNA library sequences on gene expression can be determined by phenotypic selection, FACS sorting (for expression of fluorescent proteins), or RNA-seq. Changes on chromatin are probed with different methods (see text for details). E) Library data analysis. High-throughput sequencing is used to identify the library sequences associated with different phenotypes (chromatin accessibility, gene expression level, etc.). The sequencing reads may reflect the library sequences directly, or barcodes paired with library sequences

These studies found that motif strength positively correlates with TF binding (Grossman et al. 2017; Liu et al. 2020; Szczesnik et al. 2020). In addition, cooperativity and competition among TFs play a significant role in modulating TF binding in yeast (Liu et al. 2020; Zeigler and Cohen 2014). Incorporation of TF binding measurements into a gene expression model improves its predictive power (Zeigler and Cohen 2014). The findings in mammalian cells are more complicated as motif occurrences tend to have low predictive power for TF binding. Two studies provide different explanations for such binding site selectivity. By swapping 25 different core motifs into 25 different flanking sequences, Grossman et al. (2017) found that in vivo binding of adipogenesis regulator PPARγ on plasmids is predominantly determined by its core motifs (Grossman et al. 2017). It was proposed that the site selectivity of PPARγ in the native genome is mainly due to differential chromatin accessibility and epigenetic modifications. Another study on Wnt effector Tcf7l2 using an integrated sequence library showed that, although local chromatin accessibility plays a role, its binding specificity is heavily affected by the 99 bp surrounding sequences (Szczesnik et al. 2020). In particular, the presence of Oct4 and Klf4 motifs promote Tcf7l2 binding, and the effect oscillates with the distance between Tcf7l2 and co-factor motifs with a 10.8 bp phasing, indicating the importance of interaction with co-factors at the same orientation on the DNA helix. It is possible that TFs use different strategies to achieve binding specificity. Studies on more TFs need to be done to see if there is a dominant strategy.

A different experimental design by Vanzan et al. (2021) used an indirect method to infer pioneer transcription factor (PF) binding to DNA by measuring differential methylation status and screened reported PFs for their ability to induce methylation changes (Vanzan et al. 2021). In this study, DNA libraries containing binding motifs of different mammalian TFs are either methylated in vitro or left unmethylated and integrated into the mammalian genome. Changes in methylation levels after integration are then measured to infer the binding and effect of the corresponding PFs. The results revealed two groups of PFs: protective PFs (PPFs) which protect DNA from methylation and super PFs (SPFs) which induce DNA demethylation at methylated binding sites.

Using Synthetic DNA Libraries to Investigate Chromatin Accessibility

As a conserved feature of the eukaryotic genome, nucleosomes adopt a “canonical” positioning pattern where replication origins, promoters, and active enhancers tend to be nucleosome-depleted, while the rest of the genome is mostly nucleosome-occupied. Such a positioning pattern is critical in regulating gene expression level, dynamics, and cell-to-cell variability (Bai and Morozov 2010; Jiang and Pugh 2009). More specifically, nucleosome-depleted-regions (NDRs) in enhancers and promoters provide higher accessibility for TFs and transcription machinery to bind, resulting in faster and more robust gene activation (Bai et al. 2010; Small et al. 2014; You et al. 2011; Zhang et al. 2013). To decipher the genetic rules of transcriptional activity, it is important to elucidate the mechanism underlying chromatin accessibility.

Considerable experimental and computational efforts have been devoted to dissecting the sequence elements that lead to chromatin opening. The poly(A/T) sequence is well-known to disfavor nucleosome formation. Although originally thought to repel nucleosomes through its higher bending rigidity, more recent studies indicate that poly(A/T) recruits a remodeling complex, RSC, to actively remove nucleosomes (Kharerin and Bai 2021; Krietenstein et al. 2016; Lorch et al. 2014). This is one example of nucleosome depletion through chromatin remodelers (CRs). In addition, a subset of sequence-specific TFs, often referred to as “pioneer factors (PFs)”, play a significant role in chromatin opening (Zaret and Carroll 2011). PFs can recognize nucleosome-embedded binding sites, invade compact chromatin, and deplete nucleosomes to expose nearby sequences. This way, PFs can direct the binding of other TFs and allow them to regulate transcription.

Synthetic DNA libraries have been used to investigate the function of PFs and CRs in nucleosome positioning, and in particular, NDR formation and size control. Most of these investigations have been carried out in budding yeast, although studies in human cells have started to appear in recent years. The general experimental design is shown in Fig. 1. Similar to the method described above to study TF binding, designed sequences that are potential targets of PFs or CRs are synthesized and cloned into a plasmid library (Fig. 1AB and Fig. 2). The plasmids are then integrated into the yeast genome at a specific location, and the resulting nucleosome positioning is probed through micrococcal nucleus (MNase) or DNA methylation followed by amplicon sequencing over the integrated sequences (Fig. 1CE). MNase preferably digests naked DNA over nucleosomal DNA, and therefore, sequences protected by a nucleosome have a higher probability to survive MNase digestion, which would yield a higher level of PCR product (~ 100 bp) and sequencing reads. For convenience, we name this method MNase-ISO (MNase over Integrated Synthetic Oligonucleotides). Alternatively, chromatin can be treated with a methyltransferase, like M.CviPI, that selectively methylates the cytosine in GC dinucleotides within exposed DNA (Jessen et al. 2004, 2006). After bisulfite conversion, the integrated synthetic regions are amplified (0.5–1.5 kb depending on the sequencing method) and subject to amplicon sequencing (Methyl-ISO). In this way, methylation over continuous stretches of DNA containing the variable synthetic sequences can be detected. In comparison with MNase-ISO, which measures population-averaged nucleosome occupancy at a single location, Methyl-ISO provides nucleosome positioning information over a multi-nucleosomal region at the single-DNA /single-cell level. Although information-rich, Methyl-ISO involves more elaborate experimental steps and data analysis, so the choice of these methods should be geared towards the purpose of the experiment.

Fig. 2.

Fig. 2

Schematics of constructs for methods described in Table 1. Notations: mP = minimal promoter; ORF = open reading frame, usually a reporter gene; BC = barcode; pA = polyadenylation signal; LTR = long terminal repeat; NFR = nucleosome-free region; Ub = ubiquitin promoter; HygroR = hygromycin resistance gene; P = promoter; sgRNA = single-guide RNA; FAIRE = Formalde-hydeAssisted Isolation of Regulatory Elements fragments; TR = terminal repeats; TF = transcription factor

One study generated 70 promoter variants to study the effect of a nucleosome disfavoring sequence, poly(A/T), on nucleosome occupancy and gene expression in budding yeast (Raveh-Sadka et al. 2012). By manipulating the length, composition, and location of the poly(A/T) tracks in the same promoter background, this study showed that stronger poly(A/T) leads to local depletion of nucleosomes, which correlates with stronger promoter activity. These observations led to the conclusion that poly(A/T) can be used for fine-tuning of gene expression level in yeast. While this early study constructed and measured the promoter variants individually, later works utilized synthetic oligonucleotides and next-generation sequencing that allow simultaneous interrogation of much larger sets of libraries. Levo et al. (2017) devised a synthetic reporter assay with 1500 promoter variants containing different combinations of 12 TF binding motifs and measured nucleosome occupancies on these promoter variants with Methyl-ISO, as well as their transcriptional activities (Levo et al. 2017). Their data revealed significant differences among TFs in terms of their chromatin-altering and transcription-driving capacities, and importantly, these two activities are not well-correlated with each other. This is consistent with the functional division between PFs and classic transcription activators: the former tends to open chromatin without direct activation, while the latter binds passively to open regions and activates nearby genes (Bai et al. 2011). The difference in chromatin-altering activity among TFs was further illustrated by (Yan et al. 2018). In this work, they designed > 15,000 synthetic oligonucleotides containing motifs for all known yeast TFs, integrated these oligos into a well-positioned nucleosome in the yeast genome, and measured nucleosome occupancy with MNase-ISO. The TF binding sites that display lower nucleosome occupancy indicate higher nucleosome-displacement activity, or “pioneering” activity, of the corresponding TF. This assay categorized TFs into strong, weak, or non-PFs. Systematic comparison among these categories revealed that pioneering activity correlates closely with TF affinity and concentration, supporting a model in which strong TF binding facilitates its invasion into nucleosomes. Similar studies have been carried out in mouse embryonic stem cells (mESCs) (Hammelman et al. 2020). By integrating thousands of 100 bp sequences containing endogenous accessible regions or synthetic TF motifs and measuring accessibility with DNA methylation in mESCs and derived definitive endoderm cells, this work was able to identify TF motifs that are responsible for generating cell-type specific chromatin accessibility.

Another set of factors that regulate chromatin accessibility is CRs. Unlike RSC, which can be directly targeted through sequence motifs, most CRs do not have sequence specificity and may be recruited by TFs to further regulate chromatin opening. CRs may also function upstream PF binding to facilitate their nucleosome invasion. To understand the sequence of events and potential PF-CR specificity, Chen et al. 2022a studied the coordination between all yeast PFs and four CRs – RSC, SWI/SNF, ISW2, and INO80. This work carried out methyl-ISO assays to measure the extent of NDRs generated by individual PFs with or without a CR (Chen et al. 2022a). They found that CRs are dispensable for nucleosome invasion by PFs and function downstream of PF invasion to modulate NDR lengths. They also found a few cases where CRs are specifically recruited by certain PFs.

Using Synthetic DNA Libraries to Investigate Promoter Activities

The core promoter is generally defined as the short sequence spanning −50 bp upstream to +50 bp downstream of the TSS, where the transcriptional pre-initiation complex (PIC) assembles (Haberle and Stark 2018). These elements are among the most well-characterized regulatory sequence classes, in part due to their predictable location relative to the gene TSS. The ability of an element to assemble the PIC is conferred by motifs found within the core promoter sequence. For example, the TATA box motif, which recruits the TATA-box binding protein within the TFIID complex, is found −32 bp to −29 bp upstream of a subset of promoters (Ponjavic et al. 2006). Alternatively, the initiator (Inr) and downstream promoter element (DPE) often co-occur in TATA-less promoters (Burke and Kadonaga 1997; FitzGerald et al. 2006; Smale and Baltimore 1989). Other less abundant motifs enriched in core promoters have been identified computationally by analyzing databases of known promoter sequences across species (Calo and Wysocka 2013; FitzGerald et al. 2006; Roy and Singer 2015). These motifs have been exploited to construct synthetic super-promoters with ideal sequence configurations that confer potent gene expression (Even et al. 2016; Juven-Gershon et al. 2006). However, the vast majority of genomic promoters lack some or all defined core promoter motifs, illustrating the need to explore additional sequence features driving promoter function in vivo (Roy and Singer 2015).

Given that the ability of a promoter to initiate gene expression is predominantly encoded in its sequence, features stimulating or inhibiting promoter activity can be more efficiently dissected via synthetic DNA libraries. In fact, this approach is especially well-suited for promoter studies, given their relatively short lengths. Synthetic libraries have been employed to validate putative genomic promoters and dissect functional sequence features within genomic or synthetic promoters in diverse species and cellular contexts.

In one of the earliest examples, Myers et al. performed saturation mutagenesis of the mouse β-major globin gene promoter, cloned each mutant into a plasmid upstream of a mouse-human hybrid β-globin gene, and measured β-globin transcription driven by each mutant in HeLa cells (Myers et al. 1986). This approach resulted in the characterization of three sequence features required for transcriptional initiation of the β-globin gene: the CACCC box, CCAAT box, and the TATA box (Myers et al. 1986). The advent of high-throughput sequencing technologies enabled studies using much larger promoter libraries, which generally follow the experimental pipeline depicted in Fig. 1. For example, Kinney et al. designed a synthetic DNA library of ~ 200,000 mutated Escherichia coli lac promoter sequences driving the expression of GFP within a reporter plasmid (Kinney et al. 2010). E. coli cells transformed with the plasmid library were partitioned by FACS sorting of GFP expression levels and sequenced to classify mutant promoters, allowing identification of functional elements within the endogenous lac promoter (Kinney et al. 2010). In contrast to studies focusing on a single promoter in a specific cellular context (Kinney et al. 2010; Kotopka and Smolke 2020; Myers et al. 1986; Yu et al. 2021), many subsequent high-throughput promoter studies have interrogated diverse endogenous promoter libraries in a wide variety of species and cell types (Cooper et al. 2006; Hong and Cohen 2022; Hornung et al. 2012; Landolin et al. 2010; Lubliner et al. 2015; Mattioli et al. 2019; Renganaath et al. 2020; Trinklein et al. 2003; van Arensbergen et al. 2017; Weingarten-Gabbay et al. 2019). Additionally, several studies have explored the promoter activity conferred by a library of random or designed synthetic oligonucleotides (de Boer et al. 2020; Mogno et al. 2013; Sharon et al. 2012; Weingarten-Gabbay et al. 2019; Wu et al. 2019).

There are now many different methods for designing promoter libraries (Table 1). The key considerations include the sequence length, library complexity, origin of library sequences (genomic or synthetic), chromatin context (integrated or extrachromosomal), and the readout method of promoter strength. While libraries employing endogenous sequences can, in general, test longer variable regions with a larger number of variations, synthetic sequences have the advantage of allowing researchers to systematically change motif number, spacing, orientation, affinity, or GC-content. Therefore, the former approach is commonly used for identifying functional promoter elements in the genome, while the latter is more suitable for systematically dissecting the variables influencing promoter activity. Using an extra-chromosomal approach allows fast screening of larger promoter libraries, while genomic integration (especially site-specific integration) can better recapitulate in vivo promoter activity and determine locus-specific effects for a smaller promoter library. For the readout, most published studies quantify promoter strength via RNA-seq of barcoded transcripts. In addition to its high throughput, a major advantage of this approach is that it can individually measure the activities of multiple promoters in the same cell. In contrast, if the promoter activity is measured based on the intensity of a fluorescent reporter driven by the library, extra experimental steps would be required to ensure that each cell contains only one library copy. The fluorescence approach, however, can accurately measure promoter activity in single cells, allowing the quantification of both average promoter strength and its cell-to-cell variability.

The high-throughput promoter activity studies employing DNA libraries have significantly reinforced and expanded our understanding of the features driving promoter activities. It was found that nucleotide content is a key predictor of promoter activity, with CpG density, CG content, T/C-rich tracts in the scanning region, A-content at the TSS, and nucleosome-disfavoring A/T tracts all positively correlating with transcription level (Lubliner et al. 2015; van Arensbergen et al. 2017; Weingarten-Gabbay et al. 2019). Activator, TATA, Inr, and TF motifs were found to increase promoter activity, with multiple TF motifs and TF-TF cooperativity further enhancing gene expression (de Boer et al. 2020; Mogno et al. 2013; Renganaath et al. 2020; Sharon et al. 2012; Weingarten-Gabbay et al. 2019). Orientation, location, and sequence context surrounding TF motifs and promoter elements also contribute to their function (de Boer et al. 2020; Lubliner et al. 2015). Features found to repress promoter activity include repressor motifs, BRE upstream and downstream elements, mutations to TF motifs, and distance between TF motifs and the TSS (Sharon et al. 2012; Weingarten-Gabbay et al. 2019; Yu et al. 2021). Other interesting findings include a ~ 10-bp periodicity in the relationship between TF motif location and the TSS, a bidirectional transcriptional behavior of TF binding sites, and a unidirectional transcriptional behavior of core promoters (Sharon et al. 2012; Weingarten-Gabbay et al. 2019). Intriguingly, a recent study found that genomic locus, and therefore chromatin context and linear enhancer proximity, have little effect on intrinsic promoter activity (Hong and Cohen 2022). Regarding different promoter classes, high and low CG content promoters were found to display ubiquitous and cell-type specific gene expression, respectively, and cell-type specific promoters were predicted to be regulated by distal enhancers, a wider variety of TF motifs, and lower motif density compared to ubiquitous promoters (Landolin et al. 2010; Mattioli et al. 2019; van Arensbergen et al. 2017).

Using Synthetic DNA Libraries to Investigate Enhancer Activities

Promoter activity can be activated or enhanced by distal enhancers. Enhancers are key regulators of spatiotemporal gene expression in organism development, environmental stimulus response, and disease progression. Therefore, it is important to identify enhancers within the genome and determine the mechanisms by which they are activated and subsequently initiate expression of a subset of genes in specific cellular contexts.

In higher eukaryotic cells, it can be challenging to locate and characterize enhancers because they can exist anywhere within the vast non-coding portion of the genome at variable distances from their target genes. However, enhancers exhibit some commonalities that can be exploited to facilitate their discovery (Catarino and Stark 2018; Panigrahi and O’Malley 2021; Shlyueva et al. 2014a). Enhancers tend to span a few hundred base pairs, and their underlying sequence and function are generally conserved across species (Chen et al. 2018; Hardison 2000; Pennacchio et al. 2006; Visel et al. 2008; Wong et al. 2020; Woolfe et al. 2004). They are composed of clusters of sequence-specific TF binding sites, and the combinatorial binding of TFs to a subset of these sites is thought to cause cell-type specific gene activation (Arnone and Davidson 1997; Arnosti and Kulkarni 2005; Chaudhari and Cohen 2018; Ren et al. 2000; Spitz and Furlong 2012; Vandel et al. 2019). Active enhancers tend to be associated with open chromatin, with neighboring nucleosomes decorated with H3K27ac and H3K4me1 modifications (Boyle et al. 2008; Buenrostro et al. 2013; Calo and Wysocka 2013; Giresi et al. 2007; Heintzman et al. 2009, 2007; Schones et al. 2008; Thurman et al. 2012; Yuan et al. 2005). They also have the ability to initiate transcription of their own sequences, producing enhancer RNAs (eRNAs), which has been shown to be the most accurate method for identifying active enhancers (De Santa et al. 2010; Henriques et al. 2018; Hirabayashi et al. 2019; Kim et al. 2010; The et al. 2014; Tippens et al. 2020; Yao et al. 2022). Putative enhancers have been identified in various cell contexts based on these common features (Chen et al. 2012; The et al. 2020; Tobias et al. 2021; Won et al. 2008; Yip et al. 2013). However, these predictions require in vivo experimental verification and mechanistic exploration.

Traditionally, putative enhancers have been validated by testing their ability to activate a minimal promoter within a reporter gene in transfected cells or a transgenic organism (Banerji et al. 1981). However, the low-throughput and time-consuming nature of this method prevented comprehensive analyses of putative enhancers within the genome. In the past decade, several high-throughput techniques employing synthetic DNA libraries have been designed and exploited to quantify the enhancer activity of massive libraries of sequences in various contexts. Similar to promoter libraries, the key design features of the enhancer libraries include the length, size, sequence origin, transfection protocols, and readout methods. These considerations are detailed below.

Massively parallel reporter assays (MPRAs) measure the enhancer activities of hundreds to thousands of DNA oligonucleotides cloned upstream of a minimal promoter driving expression of a reporter gene within an episomal plasmid (Fig. 1AB and Fig. 2) (Melnikov et al. 2012; Patwardhan et al. 2012). The oligonucleotide library used for an MPRA can be either chemically synthesized (Chaudhari and Cohen 2018; Chen et al. 2022b; Doni Jayavelu et al. 2020; Erceg et al. 2014; Ernst et al. 2016; Fiore and Cohen 2016; Grossman et al. 2017; Inoue et al. 2017, 2019; Kheradpour et al. 2013; Klein et al. 2018b; Kreimer et al. 2022; Kwasnieski et al. 2014; Maricque et al. 2016; Melnikov et al. 2012; Nguyen et al. 2016; Patwardhan et al. 2012; Sahu et al. 2022; Smith et al. 2013; Waters et al. 2021; White et al. 2016, 2013) or derived from sheared genomic DNA (Arnold et al. 2014, 2013; Barakat et al. 2018; Dickel et al. 2014; Liu et al. 2017a; Muerdter et al. 2018; Murtha et al. 2014; Neumayr et al. 2022; Pang and Snyder 2020; Peng et al. 2020; Sahu et al. 2022; Shen et al. 2016; Shlyueva et al. 2014b; Vanhille et al. 2015; Wang et al. 2018; Yáñez-Cuna et al. 2014). In the original MPRA method, each sequence in the oligonucleotide library is assigned to a unique barcode, which is placed in the 3’ UTR of the reporter gene (Melnikov et al. 2012; Patwardhan et al. 2012). After transfecting the plasmid library into living cells or introducing it into an organism, RNA-seq is performed to quantify the enhancer activity of each oligonucleotide by the transcription of its barcode (Fig. 1CE).

MPRA libraries composed of synthetic oligonucleotides are entirely customizable, allowing the enhancer activity of wild type, mutated, or artificial elements to be studied. For example, DNA libraries synthesized for MPRAs have been designed to experimentally validate the activity of putative enhancers (Doni Jayavelu et al. 2020; Fiore and Cohen 2016; Inoue et al. 2017, 2019; Kheradpour et al. 2013; Klein et al. 2018b; Kwasnieski et al. 2014; Maricque et al. 2016; Nguyen et al. 2016; Sahu et al. 2022; White et al. 2016, 2013), systematically dissect activating or repressive elements within individual enhancers (Chen et al. 2022b; Ernst et al. 2016; Kreimer et al. 2022; Melnikov et al. 2012; Patwardhan et al. 2012), and explore sequence features conferring enhancer activity to artificially engineered elements (Chaudhari and Cohen 2018; Erceg et al. 2014; Grossman et al. 2017; Smith et al. 2013; Waters et al. 2021; White et al. 2016). There now exists a wealth of publicly available MPRA datasets in diverse cellular contexts. One meta-analysis combined several MPRA datasets that are based on synthetic oligonucleotides derived from endogenous human genomic sequences and found that chromatin accessibility and number of TF binding sites were the most predictive features of enhancer activity (Kreimer et al. 2019). Interestingly, cell-type specific features did not improve model performance, and chromosomal MPRA data sets outperformed episomal data sets in terms of predictive power.

One limitation of MPRAs utilizing synthetic DNA libraries is the restriction of sequence length (< 230 bp) and complexity (< 250,000) due to the current synthesis technology. These limitations can be lifted by using sheared genomic DNA instead of synthetic oligonucleotides, as seen in one variation of MPRA, self-transcribing active regulatory region with sequencing (STARR-seq) (Arnold et al. 2013; Muerdter et al. 2015). To create the STARR-seq plasmid library, isolated genomic DNA is fragmented, ligated to adaptors, and inserted into the STARR-seq vector (Fig. 1AB and Fig. 2). STARR-seq takes advantage of the fact that enhancer sequences can function irrespective of directionality and distance from the promoter by positioning the putative enhancer sequence in the 3’ UTR of a reporter gene driven by a minimal promoter (Fig. 1B). Upon transfecting the plasmid library into living cells, oligonucleotides functioning as active enhancers will induce transcription of the reporter gene including their own sequence, enabling enhancer identification and quantification by RNA-seq (Fig. 1CE).

The removal of the need for a barcode allows STARR-seq to be performed on millions of sheared genomic DNA fragments, which has facilitated the genome-wide discovery of enhancers in various organisms and cellular contexts. However, even libraries on the scale of millions may fail to identify many active enhancers. For example, one study employing STARR-seq to measure the enhancer activity of over 1 million 600 bp fragments spanning 1 Mb of the human genome found only six active enhancers upon transfection into HeLa cells (Arnold et al. 2013). Given the large size of the human genome and comparatively small number and size of active enhancers in each cell type, the probability of positive hits is very small. One variation to STARR-seq called CapStarr-seq combats this issue by enriching the library for putative enhancers (Vanhille et al. 2015). In this method, putative enhancers are captured by hybridizing genomic DNA fragments (~ 400 bp) to a custom oligonucleotide microarray and used in a STARR-seq assay (Vanhille et al. 2015).

Another type of MPRA called Functional Identification of Regulatory Elements within Accessible Chromatin (FIRE-WACh) takes advantage of the fact that active enhancers preside in open chromatin to focus the library on accessible nucleosome-free regions (NFRs) (Murtha et al. 2014). In this method, permeabilized nuclei are treated with restriction enzymes that preferentially digest nucleosome-free DNA, allowing isolation of short NFR fragments. These fragments are then cloned into a lentiviral reporter plasmid to drive the expression of GFP, transduced into cells, and sorted based on GFP fluorescence using FACS (Fig. 2). Amplicon sequencing on GFP + cells allows identification of NFR fragments with enhancer activities. In mESCs, FIREWACh was able to identify 6,364 putative CREs within NFRs. While many identified CREs overlap with annotated mESC promoters, distal elements (> 2 kb from a TSS) are enriched for ESC-specific TF motifs and H3K4me1 and H3K27ac modifications. Furthermore, these putative enhancer elements were shown to be cell-type specific, overlapped with previously predicted or verified enhancers in mESCs, and were able to activate transcription at variable distances from gene promoters (Murtha et al. 2014).

One important consideration of MPRAs is that enhancers in an episomal context may not accurately recapitulate their endogenous activity. To address this, adeno-associated (Nguyen et al. 2016; Shen et al. 2016), lentiviral (Inoue et al. 2017; Murtha et al. 2014), transposition (Akhtar et al. 2013), and site-specific (Dickel et al. 2014) MPRAs have been employed to integrate the reporter library into the genome at one or multiple loci. Many of these techniques, including lentiviral transduction, randomly integrate oligonucleotide reporters throughout the genome, which may introduce locus-specific bias into the enhancer activity readout. In contrast, site-specific library integration via homologous recombination eliminates the variable effect from the surrounding chromatin context but may suffer from lower integration efficiency.

Using Synthetic DNA Libraries to Investigate Enhancer-Promoter Specificity

Enhancers regulate the expression of genes by interacting with and enhancing the expression of their target promoters. It is not well understood how enhancer-promoter interactions are established, and it is challenging to predict which genes an enhancer will regulate because they can exist at variably large distances (up to 1 Mb) from one another (Monfils and Barakat 2021). It is believed that spatial or linear proximity between an enhancer and promoter influences their compatibility with one another, but studies have found additional factors that may contribute to specificity, including the biochemical compatibility of TFs bound to enhancers and promoters (Calhoun et al. 2002; van Arensbergen et al. 2014). Synthetic DNA libraries can be exploited to investigate the rules governing promoter-enhancer specificity.

The classic MPRA design assumes that all enhancers are able to activate reporter gene expression from a common minimal promoter to the same extent, but several studies have shown that not all enhancers and promoters are compatible with one another (Butler and Kadonaga 2001; Li and Noll 1994; Merli et al. 1996; Ohtsuki et al. 1998; Sharpe et al. 1998; Wefald et al. 1990). This can result in false negative results for enhancer sequences that are incompatible with the promoter used in the MPRA. Therefore, it is important to test an enhancer’s ability to initiate reporter gene expression from different promoters. This approach was first tested in Drosophila melanogaster using STARR-seq, where the expression of a reporter gene driven from either a housekeeping gene core promoter or modified developmental core promoter was measured in parallel using the same putative enhancer library to assess features influencing promoter-enhancer compatibility (Zabidi et al. 2015). This study found populations of enhancers that function preferentially with either housekeeping promoters or developmental promoters, which was shown to be partly explained by differential transcription factors binding to the two promoter classes.

Two more recent studies on this topic achieved unprecedented numbers of enhancer-promoter pairings, providing new insights into the rules governing enhancer specificity. Martinez-Ara et al. investigated the ability of 556 CREs to boost the activity of 84 promoters derived from genomic loci surrounding three pluripotency genes in mESCs, using both the traditional MPRA and STARR-seq approaches (Martinez-Ara et al. 2022). They found that 21% of CREs boosted transcription, 17% repressed transcription, and the rest did not significantly affect transcription. They argued that CRE-promoter selectivity cannot be explained by promoter class or their endogenous interaction frequency, but instead may be mediated by the combinational TF binding. In another study, Bergman et al. utilized STARR-seq to measure reporter gene expression by combinations of 1,000 putative CREs and 1,000 promoters from across the genome (Bergman et al. 2022). Basal promoter and CRE activity correlated with endogenous chromatin accessibility, H3K27ac, and RNA transcription. In contrast to Martinez-Ara et al., Bergman et al. found that most CREs indiscriminately enhance the activity of promoters. However, both studies found that the ability of a CRE to enhance reporter gene expression is anti-correlated with basal promoter activity, i.e. promoters possessing high intrinsic activity tend to be less affected by CRE identity. Conversely, Bergman et al. found that enhancers with higher basal activity were able to activate a wider range of promoters. Bergman et al. also recapitulated the finding in Zabidi et al. that certain enhancers are only able to activate genes driven by either housekeeping promoters or developmental promoters and found that strong, enhancer-unresponsive promoters tend to initiate expression of housekeeping genes. Taken together, the final transcription readout has a complex dependence on promoter type, TF binding pattern on CREs and promoters, and basal CRE and promoter activities.

CRISPR-Based Approaches Using Synthetic gRNA Libraries

Most methods described above measure CRE activity in an episomal context or integrated outside of the native chromatin context. However, it can be useful to measure CRE activity in its endogenous locus. This is especially important for understanding enhancer-promoter communication, which likely depends on 3D chromosome organization and chromatin context. Endogenous CRE activity measurements can be achieved by genetic perturbation of these sequences through CRISPR-mediated approaches (Klein et al. 2018a; Lopes et al. 2016). For example, the non-homologous end joining (NHEJ) repair pathway can be exploited to introduce small indels after generating a double-strand break at a specific locus using CRISPR-Cas9. When hitting a key CRE, such indels can have a dramatic effect on the enhancer activity. Along this line, CREs conferring or disrupting enhancer activity can be systematically dissected by in situ saturating mutagenesis of an enhancer region (Canver et al. 2015; Diao et al. 2016; Dixit et al. 2016; Korkmaz et al. 2016; Rajagopal et al. 2016; Sanjana et al. 2016). In this approach, a massive pool of sgRNA sequences tiling the entire targeted region is constructed, cloned into a viral vector, and transduced into cells such that only one sgRNA is integrated into each cell (Fig. 2). Functional CREs can then be identified based on perturbation-induced target gene expression changes.

Other CRISPR-based approaches have been engineered to reversibly alter the epigenetic environment surrounding a region of interest. In these methods, catalytically dead Cas9 (dCas9) is fused to an activator domain (e.g. VP16) or repressor domain (e.g. KRAB) and recruited to CREs by the sgRNA library (CRISPRa/i) (Chavez et al. 2015; Chen et al. 2022b; Fitz et al. 2020; Fulco et al. 2019; Gilbert et al. 2013; Jinek et al. 2012; Klann et al. 2017; Perez-Pinera et al. 2013; Qi et al. 2013). These factors will modify the chromatin environment surrounding the targeted CREs, altering their activity and subsequently affecting the expression level of target genes.

One limitation of CRISPR-based approaches is the requirement for a PAM sequence downstream of each Cas9 or dCas9 targeting site, which is constrained by the enrichment of PAMs within the region of interest. This limits the tiling density of a region of interest to a resolution of 4–10 bp, on average, which prevents comprehensive analysis of the effect of each nucleotide on enhancer activity (Canver et al. 2015; Korkmaz et al. 2016). However, this resolution constraint may be resolved by utilizing Cas9 orthologues and variants with different PAM dependencies (Collias and Beisel 2021; Esvelt et al. 2013; Kleinstiver et al. 2015a, 2015b).

Initial CRISPR-mediated mutagenesis experiments were coupled with low-throughput readout methods like qPCR, Fluorescence-Activated Cell Sorting (FACS), and phenotypic selection, which are best suited for analyzing the CREs influencing one or a few chosen genes (Canver et al. 2015; Diao et al. 2016; Korkmaz et al. 2016; Rajagopal et al. 2016; Sanjana et al. 2016). In an example of a CRISPR study using phenotypic selection, Fulco et al. utilized tiling CRISPRi perturbation to identify CREs regulating the expression of GATA1 and MYC within a ~ 1 Mbp window (Fulco et al. 2019). These genes both affect K562 erythroleukemia cell proliferation in a dose-dependent manner, such that CREs affecting GATA1 and MYC expression in K562 cells can be characterized using a proliferation-based pooled assay. Several functional CREs were identified to reduce K562 proliferation, which tend to overlap with DNase I hypersensitive sites, associate with relevant TFs, exhibit sequence conservation, and be spatially proximal to the target gene. They further showed that the ability of an enhancer to regulate expression of a gene can be predicted by considering both the frequency of its 3D contacts with the gene promoter and its intrinsic activity, approximated by DNase I hypersensitivity and H3K27ac levels.

The methods above are suitable for studying the regulation of one or a few genes. Recent advances combining Cas9-mediated mutagenesis with single-cell RNA-seq (scRNA-seq) like CRISP-seq, CROP-seq, and Perturb-seq have enabled the identification of all genes affected by perturbing specific CREs (Datlinger et al. 2017; Dixit et al. 2016; Jaitin et al. 2016). scRNA-seq allows the transcriptome and gRNA identity to be established in each cell, which enables linking each Cas9-mediated perturbation to genome-wide gene expression changes. For example, Gasperiniet al. characterized hundreds of enhancer-gene pairs within K562 cells using high-throughput CRISPRi perturbation and scRNA-seq (Gasperini et al. 2019). Analyzing 254,974 single-cell transcriptomes upon perturbing 5,920 putative enhancers with unknown gene targets led to the discovery of 664 enhancer-gene pairs in K562 cells. Interestingly, in contrast to the findings by Fulco et al., most enhancer-gene pairs, though enriched for linear and spatial proximity, were not identified as contacts in Hi-C datasets, and about one-third of the pairs did not exist within the same TAD. In the future, such multiplexed, single-cell CRISPR-based approaches will be helpful in establishing enhancer-gene pairings within diverse cell contexts.

The strategies described above for studying CRE activity have revealed many interesting features of regulatory sequences. For example, only a small fraction of enhancers predicted from ENCODE features exhibit activity in MPRAs, suggesting that common features used to predict enhancers may be required but not sufficient for their activity (Kwasnieski et al. 2014). Additionally, most nucleotides within a CRE are non-essential for its function, and only discrete elements within a CRE are critical determinants of its activity (Canver et al. 2015; Korkmaz et al. 2016; Rajagopal et al. 2016). Most motifs and fragments responsible for CRE activity have been identified as TF binding sites and clusters. Endogenous chromatin accessibility, activating histone modifications, RNAPII/p300 binding, enhancer-promoter contact, and eRNA transcriptional activity have also been identified as features of active CREs. Additionally, evolutionary sequence conservation and GC content have been found to be predictive of CRE activity (Diao et al. 2016; Ernst et al. 2016; Maricque et al. 2016; Shen et al. 2016; Wang et al. 2018; White et al. 2013). Future meta-analyses integrating the results from multiple high-throughput MPRA and CRISPR-based studies employing DNA libraries within diverse species and cell contexts will ultimately lead to a more complete understanding of CRE activity and can potentially improve CRE predictions.

Using Synthetic DNA Libraries to Investigate Dual Promoter-Enhancer Activity

Interestingly, several studies have shown that enhancers can exhibit promoter activity and vice versa (Andersson and Sandelin 2020; Nguyen et al. 2016). As mentioned above, active enhancers can act as promoters by initiating synthesis of bidirectional transcripts called eRNAs, and promoter sequences are able to enhance transcription of distal genes. To distinguish sequence features specifically encoding promoter or enhancer activities, Nguyen et al. directly compared the functions of thousands of distal enhancer and promoter sequences exhibiting CREBBP binding and neural activation in mouse cortical neurons using MPRAs (Nguyen et al. 2016). In this study, the same DNA library was cloned upstream of GFP to measure promoter activity or downstream of GFP to measure enhancer activity via STARR-seq (Nguyen et al. 2016). They found that promoter-/enhancer-derived sequences exhibited promoter and enhancer activities, respectively, as expected. Interestingly, all sequences displayed comparable levels of enhancer activity, but promoter-derived sequences possessed greater promoter activity than enhancer-derived ones, suggesting that promoter activity may depend on unique sequence features, which may include CpG dinucleotides and certain sequence-specific TF motifs.

Several other recent studies employing MPRAs or CRISPR-based approaches have corroborated the finding that promoters can enhance the expression of distal genes (Dao et al. 2017; Diao et al. 2017; Engreitz et al. 2016). For example, Diao et al. developed a CRISPR-based approach called Cis-Regulatory Element Scan by Tiling-deletion and sequencing (CREST-seq) to categorize enhancers that affect POU5F1-eGFP expression within 2 Mbp of the POU5F1 locus in human ESCs, and the identified CREs overlap with 17 annotated promoters for functionally unrelated genes (Diao et al. 2017). This study found that the predominant factor influencing enhancer activity by a promoter was long-range chromatin interactions between the promoter and POU5F1-eGFP, more than TF binding sites, histone modifications, or gene expression. Dao et al. later utilized CapStarr-seq to systematically characterize mammalian promoters exhibiting enhancer activities (Dao et al. 2017; Vanhille et al. 2015), which discovered such activities among 2–3% of all coding-gene promoters in a single cell type. Enhancer-like promoters were enriched for enhancer properties including H3K4me1, H3K27ac, p300 binding, and unstable divergent transcription. Additionally, these promoters were correlated with stress-response genes, bound by stress-response-related TFs, and enriched for TF motif clusters, resembling enhancers. Enhancer-like promoters also engaged in long-range promoter-promoter interactions more often than control promoters, recapitulating the findings in Diao et al. Overall, these results indicate that a significant portion of genome-wide CREs can exhibit dual promoter and enhancer activities, suggesting that the binary enhancer-promoter classification system may not be entirely accurate (Andersson and Sandelin 2020; Kim and Shiekhattar 2015).

Using Synthetic DNA Libraries to Investigate Untranslated Region (UTR) Function and Post-Transcriptional Regulation

While TFs, nucleosomes, and CREs control transcription, gene expression is also regulated post-transcriptionally. The processing, nuclear export, localization, translation, and stability of mRNA are all under strict control, which allows fine-tuning of protein levels (Keene 2007). Post-transcriptional control is therefore a critical component of gene regulation. Disruptions of these steps are linked to developmental defects and human diseases including cancer, chronic inflammation, and neurodegenerative disease (Audic and Hartley 2004; Corbett 2018; Khabar 2010; Szaro and Strong 2010).

Untranslated regions (UTRs) of mRNAs are important players in post-transcriptional regulation. The 5’UTR mainly controls translation efficiency via its primary and secondary RNA structures, upstream open reading frames (uORFs) and internal ribosome entry sites (IRESs) (Hinnebusch et al. 2016; Leppek et al. 2018; Mignone et al. 2002). The 3’UTR plays a major regulatory role in the fates of mRNAs, including translation, subcellular localization, and degradation, via its CREs (Matoulkova et al. 2012; Mayr 2017; Mignone et al. 2002). Besides the direct effect of sequences and secondary structures, the functions of UTRs are mostly achieved by trans-acting factors including microRNAs (miRNAs) and RNA-binding proteins (RBPs) that recognize regulatory sequences within UTRs (Cai et al. 2009; Glisovic et al. 2008; Matoulkova et al. 2012). Most existing studies of UTR functions are conducted on individual genes, but globally characterizing UTR and post-transcriptional regulation remains challenging as different genes vary in their transcription levels and gene structures.

The synthetic DNA library method allows the identification of post-translational regulatory sequences within UTRs without the confounding effects from different gene contexts. Generally, oligonucleotide pools containing candidate UTR regulatory elements are synthesized and cloned upstream or downstream of the open reading frame (ORF) of a fluorescent reporter in an expression vector (Fig. 1AB). The plasmid library is integrated into the genome as single copies, where the reporter genes with variable UTRs are differentially expressed (Fig. 1C). The cells are then sorted into different expression bins based on their fluorescence intensity, followed by DNA extraction and high-throughput sequencing to link expression levels to specific library sequences (Fig. 1DE) (Cao et al. 2021; Dvir et al. 2013; Noderer et al. 2014; Oikonomou et al. 2014; Vainberg Slutskin et al. 2018; Wissink et al. 2016).

The experimental design above measures the effect of library sequences on protein production, which may result from regulation of translation efficiency and/or mRNA stability. To distinguish between these two processes, several other experimental procedures have been used. One method is to quantify DNA library and mRNA abundance (instead of protein level) from the same cell and estimate mRNA decay rate from their ratio (Griesemer et al. 2021; Litterman et al. 2019). Alternatively, a few studies have used the “Tet-Off” system to drive the expression of their sequence libraries so that transcription can be turned off with doxycycline addition, and mRNA degradation of each sequence can be directly measured over time (Siegel et al. 2022; Zhao et al. 2014). Another method that was employed in zebrafish embryos involves generating a mRNA library by in vitro transcription, injecting into the embryos, and collecting and sequencing the RNAs over time to analyze the kinetics of mRNA degradation (Rabani et al. 2017; Yartseva et al. 2017). Finally, to directly assay translation efficiency of the library sequences, polysome profiling can be used to capture actively translating mRNA (Lim et al. 2021; Sample et al. 2019).

These studies have greatly contributed to our understanding of how UTRs regulate gene expression post-transcriptionally. Many novel CREs were discovered, especially inside 3’UTRs (Oikonomou et al. 2014; Wissink et al. 2016; Zhao et al. 2014). For example, by testing a library of random 8-nucleotide sequences (8-mers) that are inserted into the 3’UTR of the human IQGAP1 gene, Wissink et al. found hundreds of novel regulatory sequences (Wissink et al. 2016). Contrary to the popular belief that 3’UTRs are usually repressive, most of these newly found elements are activating and function by modifying mRNA stability. The diverse and customized design of synthetic DNA libraries also enabled more comprehensive and systematic analysis of previously well-established regulatory elements such as Kozak consensus (Dvir et al. 2013; Noderer et al. 2014) and AU-rich element (ARE) (Siegel et al. 2022). As an important trans-regulator of UTRs, miRNA has also been extensively studied using synthetic DNA libraries (Vainberg Slutskin et al. 2018; Yartseva et al. 2017). The miRNA sequence, hybridization energy, target accessibility, and target multiplicity were shown to control the repression activity of miRNAs (Vainberg Slutskin et al. 2018). Results from these studies not only improve the annotation of the non-coding genome, but also help build models to predict post-transcriptional regulation based on sequences (Dvir et al. 2013; Noderer et al. 2014; Sample et al. 2019; Vainberg Slutskin et al. 2018). Using these data and models, researchers were able to design synthetic UTR sequences to achieve optimal expression levels (Cao et al. 2021; Sample et al. 2019) or specific mRNA dynamics (Rabani et al. 2017). They also allow researchers to interpret the effect of genetic variation and its association with phenotypes (Griesemer et al. 2021; Lim et al. 2021; Zhao et al. 2014).

Synthetic DNA libraries have also been applied to studying other types of post-transcriptional regulation. To systematically identify human and virus IRES sequences which mediate cap-independent translation, Weingarten-Gabbay et al. devised a high-throughput reporter assay using a synthetic DNA library, which uncovered thousands of new IRES sequences (Weingarten-Gabbay et al. 2016). Further characterization of these regulatory sequences potentially leads to new mechanisms underlying cap-independent translation. RNA splicing and RNA localization have also been investigated using synthetic DNA libraries. These studies discovered sequence determinants of alternative splicing (Mikl et al. 2019; Rosenberg et al. 2015), as well as regulatory elements directing RNA molecules into the nucleus (Lubelsky and Ulitsky 2018; Shukla et al. 2018) or to the neurites (Arora et al. 2022).

Using Synthetic DNA Library to Investigate Genetic Variation Affecting Gene Expression

Genetic variation among individuals is a key determinant of population diversity in phenotypes and disease-susceptibility. Most disease-related sequence variants characterized by genome-wide association studies (GWAS) are located in non-coding regions of the genome (Maurano et al. 2012), suggesting the potential roles they play in regulating gene expression. However, evaluation of the regulatory impact of these genetic variants is often non-trivial, which leads to high experimental burden.

Synthetic DNA libraries have been applied to study the gene regulatory function of genetic variation in a high-throughput manner. In these studies, DNA library sequences containing different variant alleles are designed and synthesized, followed by experimental procedures like MPRA (Doan et al. 2016; Matoba et al. 2020; Mouri et al. 2022; Mulvey and Dougherty 2021; Myint et al. 2020; Tewhey et al. 2016; Ulirsch et al. 2016; van Arensbergen et al. 2019) or STARR-seq (Klein et al. 2019; Liu et al. 2017b; Vockley et al. 2015) to measure their regulatory activities. These methods are highly efficient for pinpointing potential causal variants. For example, Mouri et al. used an MPRA to test the cis-regulatory effects of 18,312 variants associated with five autoimmune diseases and identified 60 variants that are likely to be causal (Mouri et al. 2022). They further characterized one of the identified variants both in vitro and in vivo and found that it affects downstream gene expression, naive T cell activation, and autoimmune disease risk. Similar studies have also been conducted with genetic variants associated with psychiatric, neurodevelopmental and neurodegenerative disorders (Doan et al. 2016; Lagunas et al. 2021; Matoba et al. 2020; Mulvey and Dougherty 2021; Myint et al. 2020), osteoarthritis (Klein et al. 2019) and cancer (Lim et al. 2021; Liu et al. 2017b), providing a framework for understanding GWAS variants, which can then be followed up by mechanistic studies.

While most of these studies have focused on sequence variants that reside within enhancers or promoters, a few studies investigated other classes of non-coding variants that may affect post-transcriptional regulation including RNA splicing (Cheung et al. 2019), RNA stability (Griesemer et al. 2021; Lagunas et al. 2021; Lim et al. 2021; Zhao et al. 2014) and translational efficiency (Lagunas et al. 2021; Lim et al. 2021; Sample et al. 2019; Zhao et al. 2014). For example, Cheung et al. developed a multiplexed assay to test the ability of 27,733 human variants to alter exon recognition and RNA splicing with a synthetic DNA library (Cheung et al. 2019). This assay utilizes a set of reporters containing multiple introns and exons, out of which the splicing status can be monitored by GFP expression. This study found that most genetic variants capable of disrupting exon recognition fall outside of canonical splice sites, making their functional prediction challenging. Overall, results from these studies have improved the categorization of non-coding variants based on the gene regulation stages they impact. For more information on this topic, see recent reviews (Findlay 2021; Gasperini et al. 2016; McAfee et al. 2022; Tabet et al. 2022).

Concluding Remarks and Future Perspectives

Synthetic DNA libraries have been widely used to interrogate the process of gene expression, from upstream events such as TF binding and chromatin opening, all the way to post-transcriptional and translational regulation. In comparison to the more traditional genetic approaches, i.e. perturbing the expression of proteins involved in transcriptional regulation and measuring the corresponding changes on the native genome, the library approach offers a few major advantages. First, depletion or overexpression of proteins, especially the ones with key functions in gene regulation, is likely to have global and pleiotropic effects. In the library approach, by manipulating DNA instead of proteins, gene regulation can be studied in an essentially wild-type setting without much concern for potential side-effects. Second, each step of gene regulation is a multi-variable process, and it is hard to decouple the effects from these variables based on native gene expression (dimension of the variable space is high). Many of these variables are also constrained by evolution, preventing us from broadly testing the variable spaces (the explorable range in each dimension is limited). In contrast, by systematically manipulating DNA sequences, we can solve both problems by reducing the number of variables while exploring a wide range of these variables without evolutionary constraints. Finally, unlike the genetic approach that investigates one protein at a time, by using different motifs, the library method can simultaneously measure the effect from many different factors, thus increasing the throughput.

The advantages of the synthetic library come with costs. This method only applies to systems that are amenable to genetic manipulation. The design of the library often requires prior knowledge of sequence elements that are important for gene regulation, e.g. TF binding motifs, and therefore, it is difficult to apply to less well-characterized species. Additionally, the process of making DNA libraries and introducing them into cells, especially if the library is to be integrated into the genome in a site-specific fashion, can be time-consuming. In a mixed cell library, the number of cells containing each type of sequence tends to be small, and therefore the readouts, e.g. chromatin states or gene expression level, are likely to suffer from lower signal-to-noise ratio in comparison with the same measurements carried over a homogenous population of cells. Nevertheless, we are confident that these problems will be solved with the development of more efficient genetic methods and more sensitive chromatin/gene expression assays. Ultimately, the synthetic DNA library approach has made and will continue to make significant contributions to elucidating the genetic rules of gene regulation.

Acknowledgements

The authors acknowledge the Bai lab for discussions and advice on the manuscript. This work is supported by the National Institutes of Health (T32 GM125592 to H. K. and R35 GM139654 to L.B.).

Funding

This work is supported by the National Institutes of Health (T32 GM125592 to H. K. and R35 GM139654 to L.B.).

Footnotes

Competing interests The authors declare no competing interests.

Ethical approval Not applicable for this publication.

Consent to participate and consent to publish Not applicable for this publication.

Data availability

Not applicable for this publication.

References

  1. Adamson B, Norman TM, Jost M, Cho MY, Nuñez JK, Chen Y, Villalta JE, Gilbert LA, Horlbeck MA, Hein MY, Pak RA, Gray AN, Gross CA, Dixit A, Parnas O, Regev A, Weissman JS (2016) A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167:1867–1882.e21. 10.1016/j.cell.2016.11.048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Akhtar W, de Jong J, Pindyurin AV, Pagie L, Meuleman W, de Ridder J, Berns A, Wessels LFA, van Lohuizen M, van Steensel B (2013) Chromatin position effects assayed by thousands of reporters integrated in parallel. Cell 154:914–927. 10.1016/j.cell.2013.07.018 [DOI] [PubMed] [Google Scholar]
  3. Andersson R, Sandelin A (2020) Determinants of enhancer and promoter activities of regulatory elements. Nat Rev Genet 21:71–87. 10.1038/s41576-019-0173-8 [DOI] [PubMed] [Google Scholar]
  4. Andrilenas KK, Penvose A, Siggers T (2015) Using protein-binding microarrays to study transcription factor specificity: homologs, isoforms and complexes. Brief Funct Genomics 14:17–29. 10.1093/bfgp/elu046 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A (2013) Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339:1074–1077. 10.1126/science.1232542 [DOI] [PubMed] [Google Scholar]
  6. Arnold CD, Gerlach D, Spies D, Matts JA, Sytnikova YA, Pagani M, Lau NC, Stark A (2014) Quantitative genome-wide enhancer activity maps for five Drosophila species show functional enhancer conservation and turnover during cis-regulatory evolution. Nat Genet 46:685–692. 10.1038/ng.3009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Arnold CD, Zabidi MA, Pagani M, Rath M, Schernhuber K, Kazmar T, Stark A (2017) Genome-wide assessment of sequenceintrinsic enhancer responsiveness at singlebase- pair resolution. Nat Biotechnol 35:136–144. 10.1038/nbt.3739 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Arnone MI, Davidson EH (1997) The hardwiring of development: organization and function of genomic regulatory systems. Development 124:1851–1864. 10.1242/dev.124.10.1851 [DOI] [PubMed] [Google Scholar]
  9. Arnosti DN, Kulkarni MM (2005) Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards? J Cell Biochem 94:890–898. 10.1002/jcb.20352 [DOI] [PubMed] [Google Scholar]
  10. Arora A, Castro-Gutierrez R, Moffatt C, Eletto D, Becker R, Brown M, Moor AE, Russ HA, Taliaferro JM (2022) High throughput identification of RNA localization elements in neuronal cells. Nucleic Acids Res 50:10626–10642. 10.1093/nar/gkac763 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Audic Y, Hartley RS (2004) Post-transcriptional regulation in cancer. Biol Cell 96:479–498. 10.1016/j.biolcel.2004.05.002 [DOI] [PubMed] [Google Scholar]
  12. Bai L, Morozov AV (2010) Gene regulation by nucleosome positioning. Trends Genet 26:476–483. 10.1016/j.tig.2010.08.003 [DOI] [PubMed] [Google Scholar]
  13. Bai L, Charvin G, Siggia ED, Cross FR (2010) Nucleosome-depleted regions in cell-cycle-regulated promoters ensure reliable gene expression in every cell cycle. Dev Cell 18:544–555. 10.1016/j.devcel.2010.02.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Bai L, Ondracka A, Cross FR (2011) Multiple sequence-specific factors generate the nucleosome-depleted region on CLN2 promoter. Mol Cell 42:465–476. 10.1016/j.molcel.2011.03.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Banerji J, Rusconi S, Schaffner W (1981) Expression of a β-globin gene is enhanced by remote SV40 DNA sequences. Cell 27:299–308. 10.1016/0092-8674(81)90413-X [DOI] [PubMed] [Google Scholar]
  16. Barakat TS, Halbritter F, Zhang M, Rendeiro AF, Perenthaler E, Bock C, Chambers I (2018) Functional dissection of the enhancer repertoire in human embryonic stem cells. Cell Stem Cell 23:276–288.e278. 10.1016/j.stem.2018.06.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Bergman DT, Jones TR, Liu V, Ray J, Jagoda E, Siraj L, Kang HY, Nasser J, Kane M, Rios A et al. (2022) Compatibility rules of human enhancer and promoter sequences. Nature 607:176–184. 10.1038/s41586-022-04877-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, Crawford GE (2008) High-resolution mapping and characterization of open chromatin across the genome. Cell 132:311–322. 10.1016/j.cell.2007.12.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ (2013) Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10:1213–1218. 10.1038/nmeth.2688 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Burke TW, Kadonaga JT (1997) The downstream core promoter element, DPE, is conserved from Drosophila to humans and is recognized by TAF II 60 of Drosophila. Genes Dev 11:3020–3031. 10.1101/gad.11.22.3020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Butler JEF, Kadonaga JT (2001) Enhancer–promoter specificity mediated by DPE or TATA core promoter motifs. Genes Dev 15:2515–2519. 10.1101/gad.924301 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Cai Y, Yu X, Hu S, Yu J (2009) A brief review on the mechanisms of miRNA regulation. Genom Proteom Bioinform 7:147–154. 10.1016/S1672-0229(08)60044-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Calhoun VC, Stathopoulos A, Levine M (2002) Promoter-proximal tethering elements regulate enhancer-promoter specificity in the Drosophila Antennapedia complex. Proc Natl Acad Sci USA 99:9243–9247. 10.1073/pnas.142291299 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Calo E, Wysocka J (2013) Modification of enhancer chromatin: what, how, and why? Mol Cell 49:825–837. 10.1016/j.molcel.2013.01.038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Canver MC, Smith EC, Sher F, Pinello L, Sanjana NE, Shalem O, Chen DD, Schupp PG, Vinjamur DS, Garcia SP et al. (2015) BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature 527:192–197. 10.1038/nature15521 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Cao J, Novoa EM, Zhang Z, Chen WCW, Liu D, Choi GCG, Wong ASL, Wehrspaun C, Kellis M, Lu TK (2021) High-throughput 5’ UTR engineering for enhanced protein production in non-viral gene therapies. Nat Commun 12:4138. 10.1038/s41467-021-24436-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Catarino RR, Stark A (2018) Assessing sufficiency and necessity of enhancer activities for gene expression and the mechanisms of transcription activation. Genes Dev 32:202–223. 10.1101/gad.310367.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Chaudhari HG, Cohen BA (2018) Local sequence features that influence AP-1 cis -regulatory activity. Genome Res 28:171–181. 10.1101/gr.226530.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Chavez A, Scheiman J, Vora S, Pruitt BW, Tuttle MPR, Iyer E, Lin S, Kiani S, Guzman CD, Wiegand DJ et al. (2015) Highly efficient Cas9-mediated transcriptional programming. Nat Methods 12:326–328. 10.1038/nmeth.3312 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Chen C-Y, Morris Q, Mitchell JA (2012) Enhancer identification in mouse embryonic stem cells using integrative modeling of chromatin and genomic features. BMC Genomics 13:152. 10.1186/1471-2164-13-152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Chen L, Fish AE, Capra JA (2018) Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties. PLOS Computational Biology 14:e1006484. 10.1371/journal.pcbi.1006484 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Chen H, Kharerin H, Dhasarathy A, Kladde M, Bai L (2022a) Partitioned usage of chromatin remodelers by nucleosome-displacing factors. Cell Rep 40:111250. 10.1016/j.celrep.2022.111250 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Chen PB, Fiaux PC, Zhang K, Li B, Kubo N, Jiang S, Hu R, Rooholfada E, Wu S, Wang M et al. (2022b) Systematic discovery and functional dissection of enhancers needed for cancer cell fitness and proliferation. Cell Reports 41:111630. 10.1016/j.celrep.2022.111630 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Cheung R, Insigne KD, Yao D, Burghard CP, Wang J, Hsiao Y-HE, Jones EM, Goodman DB, Xiao X, Kosuri S (2019) A multiplexed assay for exon recognition reveals that an unappreciated fraction of rare genetic variants cause large-effect splicing disruptions. Mol Cell 73:183–194.e188. 10.1016/j.molcel.2018.10.037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Collias D, Beisel CL (2021) CRISPR technologies and the search for the PAM-free nuclease. Nat Commun 12:555. 10.1038/s41467-020-20633-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Cooper SJ, Trinklein ND, Anton ED, Nguyen L, Myers RM (2006) Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res 16:1–10. 10.1101/gr.4222606 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Corbett AH (2018) Post-transcriptional regulation of gene expression and human disease. Curr Opin Cell Biol 52:96–104. 10.1016/j.ceb.2018.02.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Dao LTM, Galindo-Albarrán AO, Castro-Mondragon JA, Andrieu-Soler C, Medina-Rivera A, Souaid C, Charbonnier G, Griffon A, Vanhille L, Stephen T et al. (2017) Genome-wide characterization of mammalian promoters with distal enhancer functions. Nat Genet 49:1073–1081. 10.1038/ng.3884 [DOI] [PubMed] [Google Scholar]
  39. Datlinger P, Rendeiro AF, Schmidl C, Krausgruber T, Traxler P, Klughammer J, Schuster LC, Kuchler A, Alpar D, Bock C (2017) Pooled CRISPR screening with single-cell transcriptome readout. Nat Methods 14:297–301. 10.1038/nmeth.4177 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. de Boer CG, Vaishnav ED, Sadeh R, Abeyta EL, Friedman N, Regev A (2020) Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat Biotechnol 38:56–65. 10.1038/s41587-019-0315-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. De Santa F, Barozzi I, Mietton F, Ghisletti S, Polletti S, Tusi BK, Muller H, Ragoussis J, Wei C-L, Natoli G (2010) A large fraction of extragenic RNA Pol II transcription sites overlap enhancers. PLoS Biol 8:e1000384. 10.1371/journal.pbio.1000384 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Diao Y, Li B, Meng Z, Jung I, Lee AY, Dixon J, Maliskova L, Guan K-L, Shen Y, Ren B (2016) A new class of temporarily phenotypic enhancers identified by CRISPR/Cas9-mediated genetic screening. Genome Res 26:397–405. 10.1101/gr.197152.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Diao Y, Fang R, Li B, Meng Z, Yu J, Qiu Y, Lin KC, Huang H, Liu T, Marina RJ et al. (2017) A tiling-deletion-based genetic screen for cis-regulatory element identification in mammalian cells. Nat Methods 14:629–635. 10.1038/nmeth.4264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Dickel DE, Zhu Y, Nord AS, Wylie JN, Akiyama JA, Afzal V, Plajzer-Frick I, Kirkpatrick A, Göttgens B, Bruneau BG et al. (2014) Function-based identification of mammalian enhancers using site-specific integration. Nat Methods 11:566–571. 10.1038/nmeth.2886 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Dixit A, Parnas O, Li B, Chen J, Fulco CP, Jerby-Arnon L, Marjanovic ND, Dionne D, Burks T, Raychowdhury R et al. (2016) Perturb-Seq: dissecting molecular circuits with scalable single-cell rna profiling of pooled genetic screens. Cell 167:1853–1866.e1817. 10.1016/j.cell.2016.11.038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Doan RN, Bae BI, Cubelos B, Chang C, Hossain AA, Al-Saad S, Mukaddes NM, Oner O, Al-Saffar M, Balkhy S et al. (2016) Mutations in human accelerated regions disrupt cognition and social behavior. Cell 167(341–354):e312. 10.1016/j.cell.2016.08.071 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Doni Jayavelu N, Jajodia A, Mishra A, Hawkins RD (2020) Candidate silencer elements for the human and mouse genomes. Nat Commun 11:1061. 10.1038/s41467-020-14853-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Dvir S, Velten L, Sharon E, Zeevi D, Carey LB, Weinberger A, Segal E (2013) Deciphering the rules by which 5′-UTR sequences affect protein expression in yeast. Proc Natl Acad Sci USA 110. 10.1073/pnas.1222534110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Engreitz JM, Haines JE, Perez EM, Munson G, Chen J, Kane M, McDonel PE, Guttman M, Lander ES (2016) Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539:452–455. 10.1038/nature20149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Erceg J, Saunders TE, Girardot C, Devos DP, Hufnagel L, Furlong EEM (2014) Subtle changes in motif positioning cause tissue-specific effects on robustness of an enhancer’s activity. PLoS Genet 10:e1004060. 10.1371/journal.pgen.1004060 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Ernst J, Melnikov A, Zhang X, Wang L, Rogov P, Mikkelsen TS, Kellis M (2016) Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions. Nat Biotechnol 34:1180–1190. 10.1038/nbt.3678 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Esvelt KM, Mali P, Braff JL, Moosburner M, Yaung SJ, Church GM (2013) Orthogonal Cas9 proteins for RNA-guided gene regulation and editing. Nat Methods 10:1116–1121. 10.1038/nmeth.2681 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Even DY, Kedmi A, Basch-Barzilay S, Ideses D, Tikotzki R, ShirShapira H, Shefi O, Juven-Gershon T (2016) Engineered promoters for potent transient overexpression. PLoS One 11:e0148918. 10.1371/journal.pone.0148918 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Findlay GM (2021) Linking genome variants to disease: scalable approaches to test the functional impact of human mutations. Hum Mol Genet 30:R187–R197. 10.1093/hmg/ddab219 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Fiore C, Cohen BA (2016) Interactions between pluripotency factors specify cis-regulation in embryonic stem cells. Genome Res 26:778–786. 10.1101/gr.200733.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Fitz J, Neumann T, Steininger M, Wiedemann E-M, Garcia AC, Athanasiadis A, Schoeberl UE, Pavri R (2020) Spt5-mediated enhancer transcription directly couples enhancer activation with physical promoter interaction. Nat Genet 52:505–515. 10.1038/s41588-020-0605-6 [DOI] [PubMed] [Google Scholar]
  57. FitzGerald PC, Sturgill D, Shyakhtenko A, Oliver B, Vinson C (2006) Comparative genomics of Drosophila and human core promoters. Genome Biol 7:R53. 10.1186/gb-2006-7-7-r53 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Fulco CP, Munschauer M, Anyoha R, Munson G, Grossman SR, Perez EM, Kane M, Cleary B, Lander ES, Engreitz JM (2019) Systematic mapping of functional enhancer–promoter connections with CRISPR interference. Science 354:769–773. 10.1126/science.aag2445 [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Gasperini M, Starita L, Shendure J (2016) The power of multiplexed functional analysis of genetic variants. Nat Protoc 11:1782–1787. 10.1038/nprot.2016.135 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Gasperini M, Hill AJ, McFaline-Figueroa JL, Martin B, Kim S, Zhang MD, Jackson D, Leith A, Schreiber J, Noble WS et al. (2019) A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176:377–390.e319. 10.1016/j.cell.2018.11.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Gertz J, Siggia ED, Cohen BA (2009) Analysis of combinatorial cis-regulation in synthetic and genomic promoters. Nature 457:215–218. 10.1038/nature07521 [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Gilbert LA, Larson MH, Morsut L, Liu Z, Brar GA, Torres SE, Stern-Ginossar N, Brandman O, Whitehead EH, Doudna JA et al. (2013) CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154:442–451. 10.1016/j.cell.2013.06.044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD (2007) FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res 17:877–885. 10.1101/gr.5533506 [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Glisovic T, Bachorik JL, Yong J, Dreyfuss G (2008) RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett 582:1977–1986. 10.1016/j.febslet.2008.03.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Griesemer D, Xue JR, Reilly SK, Ulirsch JC, Kukreja K, Davis JR, Kanai M, Yang DK, Butts JC, Guney MH et al. (2021) Genome-wide functional screen of 3’UTR variants uncovers causal variants for human disease and evolution. Cell 184:5247–5260.e19. 10.1016/j.cell.2021.08.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Grossman SR, Zhang X, Wang L, Engreitz J, Melnikov A, Rogov P, Tewhey R, Isakova A, Deplancke B, Bernstein BE et al. (2017) Systematic dissection of genomic features determining transcription factor binding and enhancer function. Proc Natl Acad Sci USA 114. 10.1073/pnas.1621150114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Haberle V, Stark A (2018) Eukaryotic core promoters and the functional basis of transcription initiation. Nat Rev Mol Cell Biol 19:621–637. 10.1038/s41580-018-0028-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Hammelman J, Krismer K, Banerjee B, Gifford DK, Sherwood RI (2020) Identification of determinants of differential chromatin accessibility through a massively parallel genome-integrated reporter assay. Genome Res 30:1468–1480. 10.1101/gr.263228.120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Hardison R (2000) Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet 16:369–372. 10.1016/S0168-9525(00)02081-3 [DOI] [PubMed] [Google Scholar]
  70. Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, Hawkins RD, Barrera LO, Van Calcar S, Qu C, Ching KA et al. (2007) Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet 39:311–318. 10.1038/ng1966 [DOI] [PubMed] [Google Scholar]
  71. Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z, Lee LK, Stuart RK, Ching CW et al. (2009) Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459:108–112. 10.1038/nature07829 [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Henriques T, Scruggs BS, Inouye MO, Muse GW, Williams LH, Burkholder AB, Lavender CA, Fargo DC, Adelman K (2018) Widespread transcriptional pausing and elongation control at enhancers. Genes Dev 32:26–41. 10.1101/gad.309351.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Hinnebusch AG, Ivanov IP, Sonenberg N (2016) Translational control by 5’-untranslated regions of eukaryotic mRNAs. Science 352:1413–1416. 10.1126/science.aad9868 [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Hirabayashi S, Bhagat S, Matsuki Y, Takegami Y, Uehata T, Kanemaru A, Itoh M, Shirakawa K, Takaori-Kondo A, Takeuchi O et al. (2019) NET-CAGE characterizes the dynamics and topology of human transcribed cis-regulatory elements. Nat Genet 51:1369–1379. 10.1038/s41588-019-0485-9 [DOI] [PubMed] [Google Scholar]
  75. Hong CKY, Cohen BA (2022) Genomic environments scale the activities of diverse core promoters. Genome Res 32:85–96. 10.1101/gr.276025.121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Hornung G, Bar-Ziv R, Rosin D, Tokuriki N, Tawfik DS, Oren M, Barkai N (2012) Noise–mean relationship in mutated promoters. Genome Res 22:2409–2417. 10.1101/gr.139378.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Inoue F, Kircher M, Martin B, Cooper GM, Witten DM, McManus MT, Ahituv N, Shendure J (2017) A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity. Genome Res 27:38–52. 10.1101/gr.212092.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Inoue F, Kreimer A, Ashuach T, Ahituv N, Yosef N (2019) Identification and Massively Parallel Characterization of Regulatory Elements Driving Neural Induction. Cell Stem Cell 25:713–727. e710. 10.1016/j.stem.2019.09.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Inukai S, Kock KH, Bulyk ML (2017) Transcription factor-DNA binding: beyond binding site motifs. Curr Opin Genet Dev 43:110–119. 10.1016/j.gde.2017.02.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Isbel L, Grand RS, Schubeler D (2022) Generating specificity in genome regulation through transcription factor sensitivity to chromatin. Nat Rev Genet 23:728–740. 10.1038/s41576-022-00512-6 [DOI] [PubMed] [Google Scholar]
  81. Jaitin DA, Weiner A, Yofe I, Lara-Astiaso D, Keren-Shaul H, David E, Salame TM, Tanay A, van Oudenaarden A, Amit I (2016) Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-Seq. Cell 167:1883–1896.e1815. 10.1016/j.cell.2016.11.039 [DOI] [PubMed] [Google Scholar]
  82. Jessen WJ, Dhasarathy A, Hoose SA, Carvin CD, Risinger AL, Kladde MP (2004) Mapping chromatin structure in vivo using DNA methyltransferases. Methods 33:68–80. 10.1016/j.ymeth.2003.10.025 [DOI] [PubMed] [Google Scholar]
  83. Jessen WJ, Hoose SA, Kilgore JA, Kladde MP (2006) Active PHO5 chromatin encompasses variable numbers of nucleosomes at individual promoters. Nat Struct Mol Biol 13:256–263. 10.1038/nsmb1062 [DOI] [PubMed] [Google Scholar]
  84. Jiang C, Pugh BF (2009) Nucleosome positioning and gene regulation: advances through genomics. Nat Rev Genet 10:161–172. 10.1038/nrg2522 [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E (2012) A programmable dual-RNA–guided DNA endonuclease in adaptive bacterial immunity. Science 337:816–821. 10.1126/science.1225829 [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Juven-Gershon T, Cheng S, Kadonaga JT (2006) Rational design of a super core promoter that enhances gene expression. Nat Methods 3:917–922. 10.1038/nmeth937 [DOI] [PubMed] [Google Scholar]
  87. Keene JD (2007) RNA regulons: coordination of post-transcriptional events. Nat Rev Genet 8:533–543. 10.1038/nrg2111 [DOI] [PubMed] [Google Scholar]
  88. Khabar KS (2010) Post-transcriptional control during chronic inflammation and cancer: a focus on AU-rich elements. Cell Mol Life Sci 67:2937–2955. 10.1007/s00018-010-0383-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Kharerin H, Bai L (2021) Thermodynamic modeling of genome-wide nucleosome depleted regions in yeast. PLoS Comput Biol 17:e1008560. 10.1371/journal.pcbi.1008560 [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Kheradpour P, Ernst J, Melnikov A, Rogov P, Wang L, Zhang X, Alston J, Mikkelsen TS, Kellis M (2013) Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res 23:800–811. 10.1101/gr.144899.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Kim T-K, Shiekhattar R (2015) Architectural and Functional Commonalities between Enhancers and Promoters. Cell 162:948–959. 10.1016/j.cell.2015.08.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Kim T-K, Hemberg M, Gray JM, Costa AM, Bear DM, Wu J, Harmin DA, Laptewicz M, Barbara-Haley K, Kuersten S et al. (2010) Widespread transcription at neuronal activity-regulated enhancers. Nature 465:182–187. 10.1038/nature09033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Kinney JB, Murugan A, Callan CG, Cox EC (2010) Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc Natl Acad Sci USA 107:9158–9163. 10.1073/pnas.1004290107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Klann TS, Black JB, Chellappan M, Safi A, Song L, Hilton IB, Crawford GE, Reddy TE, Gersbach CA (2017) CRISPR–Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome. Nat Biotechnol 35:561–568. 10.1038/nbt.3853 [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Klein JC, Chen W, Gasperini M, Shendure J (2018a) Identifying Novel Enhancer Elements with CRISPR-Based Screens. ACS Chem Biol 13:326–332. 10.1021/acschembio.7b00778 [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Klein JC, Keith A, Agarwal V, Durham T, Shendure J (2018b) Functional characterization of enhancer evolution in the primate lineage. Genome Biol 19:99. 10.1186/s13059-018-1473-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Klein JC, Keith A, Rice SJ, Shepherd C, Agarwal V, Loughlin J, Shendure J (2019) Functional testing of thousands of osteoarthritis-associated variants for regulatory activity. Nat Commun 10:2434. 10.1038/s41467-019-10439-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Kleinstiver BP, Prew MS, Tsai SQ, Nguyen NT, Topkar VV, Zheng Z, Joung JK (2015a) Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition. Nat Biotechnol 33:1293–1298. 10.1038/nbt.3404 [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Kleinstiver BP, Prew MS, Tsai SQ, Topkar VV, Nguyen NT, Zheng Z, Gonzales APW, Li Z, Peterson RT, Yeh J-RJ et al. (2015b) Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523:481–485. 10.1038/nature14592 [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Korkmaz G, Lopes R, Ugalde AP, Nevedomskaya E, Han R, Myacheva K, Zwart W, Elkon R, Agami R (2016) Functional genetic screens for enhancer elements in the human genome using CRISPR-Cas9. Nat Biotechnol 34:192–198. 10.1038/nbt.3450 [DOI] [PubMed] [Google Scholar]
  101. Kotopka BJ, Smolke CD (2020) Model-driven generation of artificial yeast promoters. Nat Commun 11:2113. 10.1038/s41467-020-15977-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Kreimer A, Yan Z, Ahituv N, Yosef N (2019) Meta-analysis of massively parallel reporter assays enables prediction of regulatory function across cell types. Hum Mutat 40:1299–1313. 10.1002/humu.23820 [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Kreimer A, Ashuach T, Inoue F, Khodaverdian A, Deng C, Yosef N, Ahituv N (2022) Massively parallel reporter perturbation assays uncover temporal regulatory architecture during neural differentiation. Nat Commun 13:1504. 10.1038/s41467-022-28659-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Kribelbauer JF, Rastogi C, Bussemaker HJ, Mann RS (2019) Low-Affinity Binding Sites and the Transcription Factor Specificity Paradox in Eukaryotes. Annu Rev Cell Dev Biol 35:357–379. 10.1146/annurev-cellbio-100617-062719 [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Krietenstein N, Wal M, Watanabe S, Park B, Peterson CL, Pugh BF, Korber P (2016) Genomic Nucleosome Organization Reconstituted with Pure Proteins. Cell 167(709–721):e712. 10.1016/j.cell.2016.09.045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Kwasnieski JC, Fiore C, Chaudhari HG, Cohen BA (2014) High-throughput functional testing of ENCODE segmentation predictions. Genome Res 24:1595–1602. 10.1101/gr.173518.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Lagunas T Jr, Plassmeyer SP, Friedman RZ, Rieger MA, Fischer AD, Lucero AFA, An J-Y, Sanders SJ, Cohen BA, Dougherty JD (2021) A Cre-dependent massively parallel reporter assay allows for cell-type specific assessment of the functional effects of genetic variants in vivo. BioRxiv 2021(2005):2017.444514 [Google Scholar]
  108. Landolin JM, Johnson DS, Trinklein ND, Aldred SF, Medina C, Shulha H, Weng Z, Myers RM (2010) Sequence features that drive human promoter function and tissue specificity. Genome Res 20:890–898. 10.1101/gr.100370.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Leppek K, Das R, Barna M (2018) Functional 5’ UTR mRNA structures in eukaryotic translation regulation and how to find them. Nat Rev Mol Cell Biol 19:158–174. 10.1038/nrm.2017.103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Levo M, Avnit-Sagi T, Lotan-Pompan M, Kalma Y, Weinberger A, Yakhini Z, Segal E (2017) Systematic Investigation of Transcription Factor Activity in the Context of Chromatin Using Massively Parallel Binding and Expression Assays. Mol Cell 65(604–617):e606. 10.1016/j.molcel.2017.01.007 [DOI] [PubMed] [Google Scholar]
  111. Li X, Noll M (1994) Compatibility between enhancers and promoters determines the transcriptional specificity of gooseberry and gooseberry neuro in the Drosophila embryo. EMBO J 13:400–406. 10.1002/j.1460-2075.1994.tb06274.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Lim Y, Arora S, Schuster SL, Corey L, Fitzgibbon M, Wladyka CL, Wu X, Coleman IM, Delrow JJ, Corey E et al. (2021) Multiplexed functional genomic analysis of 5’ untranslated region mutations across the spectrum of prostate cancer. Nat Commun 12:4217. 10.1038/s41467-021-24445-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Litterman AJ, Kageyama R, Le Tonqueze O, Zhao W, Gagnon JD, Goodarzi H, Erle DJ, Ansel KM (2019) A massively parallel 3’ UTR reporter assay reveals relationships between nucleotide content, sequence conservation, and mRNA destabilization. Genome Res 29:896–906. 10.1101/gr.242552.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Liu Y, Yu S, Dhiman VK, Brunetti T, Eckart H, White KP (2017a) Functional assessment of human enhancer activities using whole-genome STARR-sequencing. Genome Biol 18:219. 10.1186/s13059-017-1345 [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Liu S, Liu Y, Zhang Q, Wu J, Liang J, Yu S, Wei G-H, White KP, Wang X (2017b) Systematic identification of regulatory variants associated with cancer risk. Genome Biol 18:194. 10.1186/s13059-017-1322-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Liu J, Shively CA, Mitra RD (2020) Quantitative analysis of transcription factor binding and expression using calling cards reporter arrays. Nucleic Acids Res 48:e50. 10.1093/nar/gkaa141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Lopes R, Korkmaz G, Agami R (2016) Applying CRISPR–Cas9 tools to identify and characterize transcriptional enhancers. Nat Rev Mol Cell Biol 17:597–604. 10.1038/nrm.2016.79 [DOI] [PubMed] [Google Scholar]
  118. Lorch Y, Maier-Davis B, Kornberg RD (2014) Role of DNA sequence in chromatin remodeling and the formation of nucleosomefree regions. Genes Dev 28:2492–2497. 10.1101/gad.250704.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Lubelsky Y, Ulitsky I (2018) Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature 555:107–111. 10.1038/nature25757 [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Lubliner S, Regev I, Lotan-Pompan M, Edelheit S, Weinberger A, Segal E (2015) Core promoter sequence in yeast is a major determinant of expression level. Genome Res 25:1008–1017. 10.1101/gr.188193.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Maricque BB, Dougherty JD, Cohen BA (2016) A genome-integrated massively parallel reporter assay reveals DNA sequence determinants of cis-regulatory activity in neural cells. Nucleic Acids Res. gkw942. 10.1093/nar/gkw942 [DOI] [PMC free article] [PubMed] [Google Scholar]
  122. Martinez-Ara M, Comoglio F, van Arensbergen J, van Steensel B (2022) Systematic analysis of intrinsic enhancer-promoter compatibility in the mouse genome. Mol Cell 82:2519–2531. e2516. 10.1016/j.molcel.2022.04.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. Matoba N, Liang D, Sun H, Aygun N, McAfee JC, Davis JE, Raffield LM, Qian H, Piven J, Li Y et al. (2020) Common genetic risk variants identified in the SPARK cohort support DDHD2 as a candidate risk gene for autism. Transl Psychiatry 10:265. 10.1038/s41398-020-00953-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  124. Matoulkova E, Michalova E, Vojtesek B, Hrstka R (2012) The role of the 3’ untranslated region in post-transcriptional regulation of protein expression in mammalian cells. RNA Biol 9:563–576. 10.4161/rna.20231 [DOI] [PubMed] [Google Scholar]
  125. Mattioli K, Volders P-J, Gerhardinger C, Lee JC, Maass PG, Melé M, Rinn JL (2019) High-throughput functional analysis of lncRNA core promoters elucidates rules governing tissue specificity. Genome Res 29:344–355. 10.1101/gr.242222.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J et al. (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337:1190–1195. 10.1126/science.1222794 [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. Mayr C (2017) Regulation by 3’-Untranslated Regions. Annu Rev Genet 51:171–194. 10.1146/annurev-genet-120116-024704 [DOI] [PubMed] [Google Scholar]
  128. McAfee JC, Bell JL, Krupa O, Matoba N, Stein JL, Won H (2022) Focus on your locus with a massively parallel reporter assay. J Neurodevelop Disord 14:50. 10.1186/s11689-022-09461-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Melnikov A, Murugan A, Zhang X, Tesileanu T, Wang L, Rogov P, Feizi S, Gnirke A, Callan CG, Kinney JB et al. (2012) Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat Biotechnol 30:271–277. 10.1038/nbt.2137 [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. Merli C, Bergstrom DE, Cygan JA, Blackman RK (1996) Promoter specificity mediates the independent regulation of neighboring genes. Genes Dev 10:1260–1270. 10.1101/gad.10.10.1260 [DOI] [PubMed] [Google Scholar]
  131. Mignone F, Gissi C, Liuni S, Pesole G (2002) Untranslated regions of mRNAs. Genome Biol 3:REVIEWS0004. 10.1186/gb-2002-3-3-reviews0004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  132. Mikl M, Hamburg A, Pilpel Y, Segal E (2019) Dissecting splicing decisions and cell-to-cell variability with designed sequence libraries. Nat Commun 10:4572. 10.1038/s41467-019-12642-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  133. Mogno I, Kwasnieski JC, Cohen BA (2013) Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants. Genome Res 23:1908–1915. 10.1101/gr.157891.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  134. Monfils K, Barakat TS (2021) Models behind the mystery of establishing enhancer-promoter interactions. Eu J Cell Biol 100:151170. 10.1016/j.ejcb.2021.151170 [DOI] [PubMed] [Google Scholar]
  135. Mouri K, Guo MH, de Boer CG, Lissner MM, Harten IA, Newby GA, DeBerg HA, Platt WF, Gentili M, Liu DR et al. (2022) Prioritization of autoimmune disease-associated genetic variants that perturb regulatory element activity in T cells. Nat Genet 54:603–612. 10.1038/s41588-022-01056-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  136. Muerdter F, Boryń ŁM, Arnold CD (2015) STARR-seq — Principles and applications. Genomics 106:145–150. 10.1016/j.ygeno.2015.06.001 [DOI] [PubMed] [Google Scholar]
  137. Muerdter F, Boryń ŁM, Woodfin AR, Neumayr C, Rath M, Zabidi MA, Pagani M, Haberle V, Kazmar T, Catarino RR et al. (2018) Resolving systematic errors in widely used enhancer activity assays in human cells. Nat Methods 15:141–149. 10.1038/nmeth.4534 [DOI] [PMC free article] [PubMed] [Google Scholar]
  138. Mulvey B, Dougherty JD (2021) Transcriptional-regulatory convergence across functional MDD risk variants identified by massively parallel reporter assays. Transl Psychiatry 11:403. 10.1038/s41398-021-01493-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Murtha M, Tokcaer-Keskin Z, Tang Z, Strino F, Chen X, Wang Y, Xi X, Basilico C, Brown S, Bonneau R et al. (2014) FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells. Nat Methods 11:559–565. 10.1038/nmeth.2885 [DOI] [PMC free article] [PubMed] [Google Scholar]
  140. Myers RM, Tilly K, Maniatis T (1986) Fine Structure Genetic Analysis of a β-Globin Promoter. Science 232:613–618. 10.1126/science.3457470 [DOI] [PubMed] [Google Scholar]
  141. Myint L, Wang R, Boukas L, Hansen KD, Goff LA, Avramopoulos D (2020) A screen of 1,049 schizophrenia and 30 Alzheimer’s-associated variants for regulatory potential. Am J Med Genet 183:61–73. 10.1002/ajmg.b.32761 [DOI] [PMC free article] [PubMed] [Google Scholar]
  142. Neumayr C, Haberle V, Serebreni L, Karner K, Hendy O, Boija A, Henninger JE, Li CH, Stejskal K, Lin G et al. (2022) Differential cofactor dependencies define distinct types of human enhancers. Nature 606:406–413. 10.1038/s41586-022-04779-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  143. Nguyen UT, Bittova L, Muller MM, Fierz B, David Y, Houck-Loomis B, Feng V, Dann GP, Muir TW (2014) Accelerated chromatin biochemistry using DNA-barcoded nucleosome libraries. Nat Methods 11:834–840. 10.1038/nmeth.3022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  144. Nguyen TA, Jones RD, Snavely AR, Pfenning AR, Kirchner R, Hemberg M, Gray JM (2016) High-throughput functional comparison of promoter and enhancer activities. Genome Res 26:1023–1033. 10.1101/gr.204834.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  145. Noderer WL, Flockhart RJ, Bhaduri A, Diaz de Arce AJ, Zhang J, Khavari PA, Wang CL (2014) Quantitative analysis of mammalian translation initiation sites by FACS-seq. Mol Syst Biol 10:748. 10.15252/msb.20145136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  146. Ohtsuki S, Levine M, Cai HN (1998) Different core promoters possess distinct regulatory activities in the Drosophila embryo. Genes Dev 12:547–556. 10.1101/gad.12.4.547 [DOI] [PMC free article] [PubMed] [Google Scholar]
  147. Oikonomou P, Goodarzi H, Tavazoie S (2014) Systematic identification of regulatory elements in conserved 3’ UTRs of human transcripts. Cell Rep 7:281–292. 10.1016/j.celrep.2014.03.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  148. Orenstein Y, Shamir R (2017) Modeling protein-DNA binding via high-throughput in vitro technologies. Brief Funct Genomics 16:171–180. 10.1093/bfgp/elw030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  149. Pang B, Snyder MP (2020) Systematic identification of silencers in human cells. Nat Genet 52:254–263. 10.1038/s41588-020-0578-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  150. Panigrahi A, O’Malley BW (2021) Mechanisms of enhancer action: the known and the unknown. Genome Biol 22:108. 10.1186/s13059-021-02322-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  151. Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, Lee C, Andrie JM, Lee S-I, Cooper GM et al. (2012) Massively parallel functional dissection of mammalian enhancers in vivo. Nat Biotechnol 30:265–270. 10.1038/nbt.2136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  152. Peng T, Zhai Y, Atlasi Y, ter Huurne M, Marks H, Stunnenberg HG, Megchelenbrink W (2020) STARR-seq identifies active, chromatin-masked, and dormant enhancers in pluripotent mouse embryonic stem cells. Genome Biol 21:243. 10.1186/s13059-020-02156-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  153. Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD et al. (2006) In vivo enhancer analysis of human conserved non-coding sequences. Nature 444:499–502. 10.1038/nature05295 [DOI] [PubMed] [Google Scholar]
  154. Perez-Pinera P, Kocak DD, Vockley CM, Adler AF, Kabadi AM, Polstein LR, Thakore PI, Glass KA, Ousterout DG, Leong KW et al. (2013) RNA-guided gene activation by CRISPR-Cas9–based transcription factors. Nat Methods 10:973–976. 10.1038/nmeth.2600 [DOI] [PMC free article] [PubMed] [Google Scholar]
  155. Ponjavic J, Lenhard B, Kai C, Kawai J, Carninci P, Hayashizaki Y, Sandelin A (2006) Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promote. Genome Biol 7:R78. 10.1186/gb-2006-7-8-r78 [DOI] [PMC free article] [PubMed] [Google Scholar]
  156. Qi LS, Larson MH, Gilbert LA, Doudna JA, Weissman JS, Arkin AP, Lim WA (2013) Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152:1173–1183. 10.1016/j.cell.2013.02.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  157. Rabani M, Pieper L, Chew GL, Schier AF (2017) A massively parallel reporter assay of 3’ UTR sequences identifies in vivo rules for mRNA degradation. Mol Cell 68(1083–1094):e1085. 10.1016/j.molcel.2017.11.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  158. Rajagopal N, Srinivasan S, Kooshesh K, Guo Y, Edwards MD, Banerjee B, Syed T, Emons BJM, Gifford DK, Sherwood RI (2016) High-throughput mapping of regulatory DNA. Nat Biotechnol 34:167–174. 10.1038/nbt.3468 [DOI] [PMC free article] [PubMed] [Google Scholar]
  159. Raveh-Sadka T, Levo M, Shabi U, Shany B, Keren L, Lotan-Pompan M, Zeevi D, Sharon E, Weinberger A, Segal E (2012) Manipulating nucleosome disfavoring sequences allows fine-tune regulation of gene expression in yeast. Nat Genet 44:743–750. 10.1038/ng.2305 [DOI] [PubMed] [Google Scholar]
  160. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E et al. (2000) Genome-wide location and function of DNA binding proteins. Science 290:2306–2309. 10.1126/science.290.5500.2306 [DOI] [PubMed] [Google Scholar]
  161. Ren X, Wang M, Li B, Jamieson K, Zheng L, Jones IR, Li B, Takagi MA, Lee J, Maliskova L, Tam TW, Yu M, Hu R, Lee L, Abnousi A, Li G, Li Y, Hu M, Ren B et al. (2021a) Parallel characterization of cisregulatory elements for multiple genes using CRISPR-path. Sci Adv 7:eabi4360. 10.1126/sciadv.abi4360 [DOI] [PMC free article] [PubMed] [Google Scholar]
  162. Ren X, Wang M, Li B, Jamieson K, Zheng L, Jones IR, Li B, Takagi MA, Lee J, Maliskova L, Tam TW, Yu M, Hu R, Lee L, Abnousi A, Li G, Li Y, Hu M, Ren B et al. (2021b) Parallel characterization of cis-regulatory elements for multiple genes using CRISPR-path. Sci Adv 7:eabi4360. 10.1126/sciadv.abi4360 [DOI] [PMC free article] [PubMed] [Google Scholar]
  163. Renganaath K, Cheung R, Day L, Kosuri S, Kruglyak L, Albert FW (2020) Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross. eLife 9:e62669. 10.7554/eLife.62669 [DOI] [PMC free article] [PubMed] [Google Scholar]
  164. Replogle JM, Norman TM, Xu A, Hussmann JA, Chen J, Cogan JZ, Meer EJ, Terry JM, Riordan DP, Srinivas N, Fiddes IT, Arthur JG, Alvarado LJ, Pfeiffer KA, Mikkelsen TS, Weissman JS, Adamson B (2020) Combinatorial single-cell CRISPR screens by direct guide RNA capture and targeted sequencing. Nat Biotechnol 38:954–961. 10.1038/s41587-020-0470-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  165. Rosenberg AB, Patwardhan RP, Shendure J, Seelig G (2015) Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163:698–711. 10.1016/j.cell.2015.09.054 [DOI] [PubMed] [Google Scholar]
  166. Roy AL, Singer DS (2015) Core promoters in transcription: old problem, new insights. Trends Biochem Sci 40:165–171. 10.1016/j.tibs.2015.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  167. Sahu B, Hartonen T, Pihlajamaa P, Wei B, Dave K, Zhu F, Kaasinen E, Lidschreiber K, Lidschreiber M, Daub CO et al. (2022) Sequence determinants of human gene regulatory elements. Nat Genet 54:283–294. 10.1038/s41588-021-01009-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  168. Sample PJ, Wang B, Reid DW, Presnyak V, McFadyen IJ, Morris DR, Seelig G (2019) Human 5′ UTR design and variant effect prediction from a massively parallel translation assay. Nat Biotechnol 37:803–809. 10.1038/s41587-019-0164-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  169. Sanjana NE, Wright J, Zheng K, Shalem O, Fontanillas P, Joung J, Cheng C, Regev A, Zhang F (2016) High-resolution interrogation of functional elements in the noncoding genome. Science 353:1545–1549. 10.1126/science.aaf7613 [DOI] [PMC free article] [PubMed] [Google Scholar]
  170. Schones DE, Cui K, Cuddapah S, Roh T-Y, Barski A, Wang Z, Wei G, Zhao K (2008) Dynamic regulation of nucleosome positioning in the human genome. Cell 132:887–898. 10.1016/j.cell.2008.02.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  171. Schraivogel D, Gschwind AR, Milbank JH, Leonce DR, Jakob P, Mathur L, Korbel JO, Merten CA, Velten L, Steinmetz LM (2020) Targeted Perturb-seq enables genome-scale genetic screens in single cells. Nat Methods 17:629–635. 10.1038/s41592-020-0837 [DOI] [PMC free article] [PubMed] [Google Scholar]
  172. Sharon E, Kalma Y, Sharp A, Raveh-Sadka T, Levo M, Zeevi D, Keren L, Yakhini Z, Weinberger A, Segal E (2012) Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat Biotechnol 30:521–530. 10.1038/nbt.2205 [DOI] [PMC free article] [PubMed] [Google Scholar]
  173. Sharpe J, Nonchev S, Gould A, Whiting J, Krumlauf R (1998) Selectivity, sharing and competitive interactions in the regulation of Hoxb genes. EMBO J 17:1788–1798. 10.1093/emboj/17.6.1788 [DOI] [PMC free article] [PubMed] [Google Scholar]
  174. Shen SQ, Myers CA, Hughes AEO, Byrne LC, Flannery JG, Corbo JC (2016) Massively parallel cis-regulatory analysis in the mammalian central nervous system. Genome Res 26:238–255. 10.1101/gr.193789.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  175. Shlyueva D, Stampfel G, Stark A (2014a) Transcriptional enhancers: from properties to genome-wide predictions. Nat Rev Genet 15:272–286. 10.1038/nrg3682 [DOI] [PubMed] [Google Scholar]
  176. Shlyueva D, Stelzer C, Gerlach D, Yáñez-Cuna JO, Rath M, Boryń ŁM, Arnold CD, Stark A (2014b) Hormone-responsive enhancer-activity maps reveal predictive motifs, indirect repression, and targeting of closed chromatin. Mol Cell 54:180–192. 10.1016/j.molcel.2014.02.026 [DOI] [PubMed] [Google Scholar]
  177. Shukla CJ, McCorkindale AL, Gerhardinger C, Korthauer KD, Cabili MN, Shechner DM, Irizarry RA, Maass PG, Rinn JL (2018) High-throughput identification of RNA nuclear enrichment sequences. The EMBO Journal 37. 10.15252/embj.201798452 [DOI] [PMC free article] [PubMed] [Google Scholar]
  178. Siegel DA, Le Tonqueze O, Biton A, Zaitlen N, Erle DJ (2022) Massively parallel analysis of human 3’ UTRs reveals that AUrich element length and registration predict mRNA destabilization. G3 (Bethesda) 12. 10.1093/g3journal/jkab404 [DOI] [PMC free article] [PubMed] [Google Scholar]
  179. Slattery M, Zhou T, Yang L, Dantas Machado AC, Gordan R, Rohs R (2014) Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci 39:381–399. 10.1016/j.tibs.2014.07.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  180. Smale ST, Baltimore D (1989) The “initiator” as a transcription control element. Cell 57:103–113. 10.1016/0092-8674(89)90176-1 [DOI] [PubMed] [Google Scholar]
  181. Small EC, Xi L, Wang JP, Widom J, Licht JD (2014) Single-cell nucleosome mapping reveals the molecular basis of gene expression heterogeneity. Proc Natl Acad Sci U S A 111:E2462–E2471. 10.1073/pnas.1400517111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  182. Smith RP, Taher L, Patwardhan RP, Kim MJ, Inoue F, Shendure J, Ovcharenko I, Ahituv N (2013) Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model. Nat Genet 45:1021–1028. 10.1038/ng.2713 [DOI] [PMC free article] [PubMed] [Google Scholar]
  183. Spitz F, Furlong EEM (2012) Transcription factors: from enhancer binding to developmental control. Nat Rev Genet 13:613–626. 10.1038/nrg3207 [DOI] [PubMed] [Google Scholar]
  184. Srivastava D, Mahony S (2020) Sequence and chromatin determinants of transcription factor binding and the establishment of cell type-specific binding patterns. Biochim Biophys Acta Gene Regul Mech 1863:194443. 10.1016/j.bbagrm.2019.194443 [DOI] [PMC free article] [PubMed] [Google Scholar]
  185. Szaro BG, Strong MJ (2010) Post-transcriptional control of neurofilaments: New roles in development, regeneration and neurodegenerative disease. Trends Neurosci 33:27–37. 10.1016/j.tins.2009.10.002 [DOI] [PubMed] [Google Scholar]
  186. Szczesnik T, Chu L, Ho JWK, Sherwood RI (2020) A high-throughput genome-integrated assay reveals spatial dependencies governing Tcf7l2 binding. Cell Syst 11(315–327):e315. 10.1016/j.cels.2020.08.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  187. Tabet D, Parikh V, Mali P, Roth FP, Claussnitzer M (2022) Scalable functional assays for the interpretation of human genetic variation. Annu Rev Genet 56:441–465. 10.1146/annurev-genet-072920-032107 [DOI] [PubMed] [Google Scholar]
  188. Tewhey R, Kotliar D, Park DS, Liu B, Winnicki S, Reilly SK, Andersen KG, Mikkelsen TS, Lander ES, Schaffner SF, Sabeti PC (2016) Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 165:1519–1529. 10.1016/j.cell.2016.04.027 [DOI] [PMC free article] [PubMed] [Google Scholar]
  189. The FC, Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Born-holdt J, Boyd M, Chen Y, Zhao X, Schmidl C et al. (2014) An atlas of active enhancers across human cell types and tissues. Nature 507:455–461. 10.1038/nature12787 [DOI] [PMC free article] [PubMed] [Google Scholar]
  190. The EPC, Abascal F, Acosta R, Addleman NJ, Adrian J, Afzal V, Ai R, Aken B, Akiyama JA, Jammal OA et al. (2020) Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583:699–710. 10.1038/s41586-020-2493-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  191. Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B et al. (2012) The accessible chromatin landscape of the human genome. Nature 489:75–82. 10.1038/nature11232 [DOI] [PMC free article] [PubMed] [Google Scholar]
  192. Tippens ND, Liang J, Leung AK-Y, Wierbowski SD, Ozer A, Booth JG, Lis JT, Yu H (2020) Transcription imparts architecture, function and logic to enhancer units. Nat Genet 52:1067–1075. 10.1038/s41588-020-0686-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  193. Tobias IC, Abatti LE, Moorthy SD, Mullany S, Taylor T, Khader N, Filice MA, Mitchell JA (2021) Transcriptional enhancers: from prediction to functional assessment on a genome-wide scale. Genome 64:426–448. 10.1139/gen-2020-0104 [DOI] [PubMed] [Google Scholar]
  194. Trinklein ND, Aldred SJF, Saldanha AJ, Myers RM (2003) Identification and Functional Analysis of Human Transcriptional Promoters. Genome Res 13:308–312. 10.1101/gr.794803 [DOI] [PMC free article] [PubMed] [Google Scholar]
  195. Ulirsch JC, Nandakumar SK, Wang L, Giani FC, Zhang X, Rogov P, Melnikov A, McDonel P, Do R, Mikkelsen TS, Sankaran VG (2016) Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165:1530–1545. 10.1016/j.cell.2016.04.048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  196. Vainberg Slutskin I, Weingarten-Gabbay S, Nir R, Weinberger A, Segal E (2018) Unraveling the determinants of microRNA mediated regulation using a massively parallel reporter assay. Nat Commun 9:529. 10.1038/s41467-018-02980-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  197. van Arensbergen J, van Steensel B, Bussemaker HJ (2014) In search of the determinants of enhancer–promoter interaction specificity. Trends Cell Biol 24:695–702. 10.1016/j.tcb.2014.07.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  198. van Arensbergen J, FitzPatrick VD, de Haas M, Pagie L, Sluimer J, Bussemaker HJ, van Steensel B (2017) Genome-wide mapping of autonomous promoter activity in human cells. Nat Biotechnol 35:145–153. 10.1038/nbt.3754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  199. van Arensbergen J, Pagie L, FitzPatrick VD, de Haas M, Baltissen MP, Comoglio F, van der Weide RH, Teunissen H, Võsa U, Franke L et al. (2019) High-throughput identification of human SNPs affecting regulatory element activity. Nat Genet 51:1160–1169. 10.1038/s41588-019-0455-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  200. Vandel J, Cassan O, Lèbre S, Lecellier C-H, Bréhélin L (2019) Probing transcription factor combinatorics in different promoter classes and in enhancers. BMC Genomics 20:103. 10.1186/s12864-018-5408-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  201. Vanhille L, Griffon A, Maqbool MA, Zacarias-Cabeza J, Dao LTM, Fernandez N, Ballester B, Andrau JC, Spicuglia S (2015) High-throughput and quantitative assessment of enhancer activity in mammals by CapStarr-seq. Nat Commun 6:6905. 10.1038/ncomms7905 [DOI] [PubMed] [Google Scholar]
  202. Vanzan L, Soldati H, Ythier V, Anand S, Braun SMG, Francis N, Murr R (2021) High throughput screening identifies SOX2 as a super pioneer factor that inhibits DNA methylation maintenance at its binding sites. Nat Commun 12:3337. 10.1038/s41467-021-23630-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  203. Visel A, Prabhakar S, Akiyama JA, Shoukry M, Lewis KD, Holt A, Plajzer-Frick I, Afzal V, Rubin EM, Pennacchio LA (2008) Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nat Genet 40:158–160. 10.1038/ng.2007.55 [DOI] [PMC free article] [PubMed] [Google Scholar]
  204. Vockley CM, Guo C, Majoros WH, Nodzenski M, Scholtens DM, Hayes MG, Lowe WL, Reddy TE (2015) Massively parallel quantification of the regulatory effects of noncoding genetic variation in a human cohort. Genome Res 25:1206–1214. 10.1101/gr.190090.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  205. Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y et al. (2012) Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res 22:1798–1812. 10.1101/gr.139105.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  206. Wang X, He L, Goggin SM, Saadat A, Wang L, Sinnott-Armstrong N, Claussnitzer M, Kellis M (2018) High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human. Nat Commun 9:5380. 10.1038/s41467-018-07746-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  207. Waters CT, Gisselbrecht SS, Sytnikova YA, Cafarelli TM, Hill DE, Bulyk ML (2021) Quantitative-enhancer-FACS-seq (QeFS) reveals epistatic interactions among motifs within transcriptional enhancers in developing Drosophila tissue. Genome Biol 22:348. 10.1186/s13059-021-02574-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  208. Wefald FC, Devlin BH, Williams RS (1990) Functional heterogeneity of mammalian TATA-box sequences revealed by interaction with a cell-specific enhancer. Nature 344:260–262. 10.1038/344260a0 [DOI] [PubMed] [Google Scholar]
  209. Weingarten-Gabbay S, Elias-Kirma S, Nir R, Gritsenko AA, Stern-Ginossar N, Yakhini Z, Weinberger A, Segal E (2016) Comparative genetics. Systematic discovery of cap-independent translation sequences in human and viral genomes. Science 351. 10.1126/science.aad4939 [DOI] [PubMed] [Google Scholar]
  210. Weingarten-Gabbay S, Nir R, Lubliner S, Sharon E, Kalma Y, Weinberger A, Segal E (2019) Systematic interrogation of human promoters. Genome Res 29:171–183. 10.1101/gr.236075.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  211. White MA, Myers CA, Corbo JC, Cohen BA (2013) Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks. Proc Natl Acad Sci USA 110:11952–11957. 10.1073/pnas.1307449110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  212. White MA, Kwasnieski JC, Myers CA, Shen SQ, Corbo JC, Cohen BA (2016) A simple grammar defines activating and repressing cis-regulatory elements in photoreceptors. Cell Rep 17:1247–1254. 10.1016/j.celrep.2016.09.066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  213. Wissink EM, Fogarty EA, Grimson A (2016) High-throughput discovery of post-transcriptional cis-regulatory elements. BMC Genomics 17:177. 10.1186/s12864-016-2479-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  214. Won K-J, Chepelev I, Ren B, Wang W (2008) Prediction of regulatory elements in mammalian genomes using chromatin signatures. BMC Bioinformatics 9:547. 10.1186/1471-2105-9-547 [DOI] [PMC free article] [PubMed] [Google Scholar]
  215. Wong ES, Zheng D, Tan SZ, Bower NI, Garside V, Vanwalleghem G, Gaiti F, Scott E, Hogan BM, Kikuchi K et al. (2020) Deep conservation of the enhancer regulatory code in animals. Science 370:eaax8137. 10.1126/science.aax8137 [DOI] [PubMed] [Google Scholar]
  216. Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K et al. (2004) Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol 3:e7. 10.1371/journal.pbio.0030007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  217. Wu M-R, Nissim L, Stupp D, Pery E, Binder-Nissim A, Weisinger K, Enghuus C, Palacios SR, Humphrey M, Zhang Z et al. (2019) A high-throughput screening and computation platform for identifying synthetic promoters with enhanced cell-state specificity (SPECS). Nat Commun 10:2880. 10.1038/s41467-019-10912-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  218. Yan C, Chen H, Bai L (2018) Systematic study of nucleosome-displacing factors in budding yeast. Mol Cell 71(294–305):e294. 10.1016/j.molcel.2018.06.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  219. Yáñez-Cuna JO, Arnold CD, Stampfel G, Boryń ŁM, Gerlach D, Rath M, Stark A (2014) Dissection of thousands of cell type specific enhancers identifies dinucleotide repeat motifs as general enhancer features. Genome Res 24:1147–1156. 10.1101/gr.169243.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  220. Yao L, Liang J, Ozer A, Leung AK-Y, Lis JT, Yu H (2022) A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers. Nat Biotechnol 40:1056–1065. 10.1038/s41587-022-01211-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  221. Yartseva V, Takacs CM, Vejnar CE, Lee MT, Giraldez AJ (2017) RESA identifies mRNA-regulatory sequences at high resolution. Nat Methods 14:201–207. 10.1038/nmeth.4121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  222. Yip KY, Cheng C, Gerstein M (2013) Machine learning and genome annotation: a match meant to be? Genome Biol 14:205. 10.1186/gb-2013-14-5-205 [DOI] [PMC free article] [PubMed] [Google Scholar]
  223. You JS, Kelly TK, De Carvalho DD, Taberlay PC, Liang G, Jones PA (2011) OCT4 establishes and maintains nucleosome depleted regions that provide additional layers of epigenetic regulation of its target genes. Proc Natl Acad Sci U S A 108:14497–14502. 10.1073/pnas.1111309108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  224. Yu TC, Liu WL, Brinck MS, Davis JE, Shek J, Bower G, Einav T, Insigne KD, Phillips R, Kosuri S, Urtecho G (2021) Multiplexed characterization of rationally designed promoter architectures deconstructs combinatorial logic for IPTG-inducible systems. Nat Commun 12:325. 10.1038/s41467-020-20094-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  225. Yuan G-C, Liu Y-J, Dion MF, Slack MD, Wu LF, Altschuler SJ, Rando OJ (2005) Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309:626–630. 10.1126/science.1112178 [DOI] [PubMed] [Google Scholar]
  226. Zabidi MA, Arnold CD, Schernhuber K, Pagani M, Rath M, Frank O, Stark A (2015) Enhancer–core-promoter specificity separates developmental and housekeeping gene regulation. Nature 518:556–559. 10.1038/nature13994 [DOI] [PMC free article] [PubMed] [Google Scholar]
  227. Zaret KS, Carroll JS (2011) Pioneer transcription factors: establishing competence for gene expression. Genes Dev 25:2227–2241. 10.1101/gad.176826.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  228. Zeigler RD, Cohen BA (2014) Discrimination between thermodynamic models of cis-regulation using transcription factor occupancy data. Nucleic Acids Res 42:2224–2234. 10.1093/nar/gkt1230 [DOI] [PMC free article] [PubMed] [Google Scholar]
  229. Zhang Q, Yoon Y, Yu Y, Parnell EJ, Garay JA, Mwangi MM, Cross FR, Stillman DJ, Bai L (2013) Stochastic expression and epigenetic memory at the yeast HO promoter. Proc Natl Acad Sci U S A 110:14012–14017. 10.1073/pnas.1306113110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  230. Zhao W, Pollack JL, Blagev DP, Zaitlen N, McManus MT, Erle DJ (2014) Massively parallel functional annotation of 3’ untranslated regions. Nat Biotechnol 32:387–391. 10.1038/nbt.2851 [DOI] [PMC free article] [PubMed] [Google Scholar]
  231. Zhuo Z, Yu Y, Wang M, Li J, Zhang Z, Liu J, Wu X, Lu A, Zhang G, Zhang B (2017) Recent advances in SELEX technology and aptamer applications in biomedicine. Int J Mol Sci 18. 10.3390/ijms18102142 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable for this publication.

RESOURCES