Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Apr 15.
Published in final edited form as: Methods. 2019 Feb 20;159-160:115–123. doi: 10.1016/j.ymeth.2019.02.017

Functional assays for transcription mechanisms in high-throughput

Chenxi Qiu a,b,c, Craig D Kaplan d
PMCID: PMC6589137  NIHMSID: NIHMS1523251  PMID: 30797033

Abstract

Dramatic increases in the scale of programmed synthesis of nucleic acid libraries coupled with deep sequencing have powered advances in understanding nucleic acid and protein biology. Biological systems centering on nucleic acids or encoded proteins greatly benefit from such high-throughput studies, given that large DNA variant pools can be synthesized and DNA or RNA products of transcription can be easily analyzed by deep sequencing. Here we review the scope of various high-throughput functional assays for studies of nucleic acids and proteins in general, followed by discussion of how these types of study have yielded insights into the RNA Polymerase II (Pol II) active site as an example. We discuss methodological considerations in the design and execution of these experiments that should be valuable to studies in any system.

1. Introduction

Highly-parallel DNA synthesis and high-throughput DNA sequencing technologies allow tremendous insights into sequence-activity relationships at the DNA, RNA, and protein level. The quantification of DNA sequence variants in a population of molecules by deep sequencing allows the study of such variants in a highly parallel fashion. If changes in the abundance of a nucleic acid sequence in a population relate to any functional output, deep sequencing can be adapted to study mechanisms relating to that output. Therefore, while deep sequencing-derived methods are particularly powerful for studying gene expression-related phenomena (e.g. transcription, alternative splicing, translation, and other RNA related mechanisms), they can also be adapted to study various protein functions as long as functional output correlates with changes in the abundance of particular sequences. Second, array-based DNA oligonucleotide synthesis approaches have greatly expanded the scale of synthesizing oligo pools of defined sequences, reaching tens of thousands to sub-million variants in a pool [13]. Given the ability of deep sequencing to relatively accurately determine each sequence variant frequency in a pool, many high-throughput assays have been established to couple massive variant pools with deep sequencing to study complex biological phenomena (Figure 1). In this review, we discuss examples of such assays applied to study fundamental mechanisms of transcription, with a few additional examples in molecular evolution.

Figure 1. General experimental schematics of high-throughput assays coupling pooled oligo variant library with deep sequencing.

Figure 1.

A pooled variant oligonucleotide library (each variant represented by a star of distinct color) is first cloned and introduced into cells. A selection assay, which could be fitness, gene expression, or protein activity-based, is subsequently performed on the pooled variant cells, which will alter the variant distribution meaningfully if variants have differential activity towards the selected criterion. In this example, red variants are enriched, while blue variants are depleted. Deep sequencing is applied on the pool before and after selection to quantify the changes in variant frequency. Different normalization approaches are applied in different experimental setups.

Eukaryotic mRNA is synthesized by a twelve-subunit protein complex RNA polymerase II (Pol II) in a multi-step process including initiation, elongation, and termination steps. Initiation, where Pol II is recruited to promoter DNA and the transcription start site is made available, is intensely regulated by transcription factors working through cis-regulatory elements (CREs), their cofactors, and the chromatin state of promoter DNA. Transcriptional output depends on diverse and complex input, full understanding of which benefits from large-scale quantitative measurement of many factors.

1.1. Massive sequence libraries to study promoter/enhancer mechanisms

High-throughput reporter-based assays to study transcription mechanisms come in many flavors. For example, the development of massively parallel reporter assays (MPRA) has enabled experimental characterization of tens of thousands of promoters and enhancers [430]. To measure promoter activities, a basic setup clones a large number of pooled promoter variants in place with a single reporter gene and measures the expression output by deep sequencing [4, 7, 12, 14, 15, 17, 19, 2428] (also reviewed in [9, 28]). The key challenge here is to link an expressed RNA to each promoter variant since variant sequences are upstream of or absent in the transcribed RNA that is sequenced. To this end, unique molecular barcodes are cloned in the transcribed region of the reporter gene to index each promoter variant, such that enrichment or depletion of the barcodes quantified by deep sequencing reflects the reporter gene expression under the linked promoter variant. Barcodes are also useful in that sequencing can be targeted to these molecular “tags”, such that the sequenced region is generally small and robust to sequencing errors. Critical experimental parameters and caveats regarding linking unique barcodes to variants will be discussed in further below.

MPRAs have been employed mainly to measure activity of enhancers and promoters and their variants, with the additional power to examine transcription start site usage (TSS) under particular experimental designs. To measure enhancer/promoter activities, pooled enhancer variants are linked to a weak promoter that itself does not drive sufficient expression of a reporter gene without enhancement [6, 811, 13, 29]. To assess TSS usage by variant promoters, an approach called massively systematic transcript end readout (MASTER) has been developed to measure the transcript ends and yields for a large set of randomized putative transcription start sites within a canonical E. coli promoter [3134]. Sequencing of template DNA is required to definitively link an individual variant with a variant barcode in most CRE-focused MPRA experiments as only the latter will be expressed as RNA. Importantly, sequencing of DNA templates should also employ a unique molecular barcoding strategy to allow individual variant templates to be quantified. This allows normalization of expression to template level. Variations on MPRA have been reviewed elsewhere or are cited above [9, 28].

1.2. Massive sequence libraries to study RNA-level mechanisms

The ability to precisely quantify RNAs by next-generation sequencing has enabled other applications using massive sequence libraries to study RNA-level mechanisms. For example, a massive synthetic library comprising two million synthetic mini-genes has been measured and trained for learning sequence motifs for alternative splicing [20]. In this experiment, the idea was to use variants to relearn rules of alternative splicing, model splicing efficiency from library variants, and use the model to predict effects of observed sequence variation in humans on splice-site usage. Splicing has also been examined here [35, 36]. 3’ UTR sequence libraries have been used to study sequence rules of 3’ UTR contribution to mRNA stability in yeast, human cells and in zebrafish embryos and to miRNA regulation in human cells [18, 30, 3740]. Furthermore, a few large sets of truncated or natural 5’ UTR variants have been assessed for their contribution to ribosome loading and translation [5, 22, 41, 42]. Quantitative or semi-quantitative models entrained using these experimental datasets can be informative for understanding the critical sequence contributions for these complex biological processes, with longer-term goals of being predictive of effects of naturally occurring sequence variants in populations [5, 18, 20, 22, 30, 3544].

1.3. Massive sequence libraries to study Structure/Function relationships

Above we have discussed deep sequencing applications to understanding sequence-activity relationships at the DNA or RNA level. For RNAs or proteins, deep mutational scanning has emerged as a powerful approach to study structure/function relationships[5, 4576]. In a deep mutational scanning experiment, a large set of mutants are pooled for a selection or selections that relate to functional output(s) linked to a protein or proteins of interest, after which strengths of phenotypes are assessed by counting allele frequency changes in the variant pool by deep sequencing. Different selection approaches have been designed such that a specific protein property (contribution to fitness, protein thermo-stability, ligand binding etc.) can be assayed [45, 4750, 54, 66]. Deep mutational scanning has been widely employed to study fundamental questions in protein or tRNA structure-function [4651, 5355, 59, 60, 62, 64, 68, 69, 71, 72, 74, 75] and molecular evolution [48, 61, 73, 76], as well as in engineering efforts such as protein design and directed evolution [52, 56, 57, 66]. In a later section, we will briefly discuss some insights into molecular evolution from recent deep mutational scanning studies, with particular focus on yeast Pol II and transcription mechanism as examples. First, we will discuss practical aspects of utilizing such approaches.

2. How to do it

2.1. Obtaining libraries

Multiple options for pooled oligo variant libraries are commercially available. Most simply, oligonucleotides containing a randomized region, where mixed deoxynucleotides are incorporated into user-defined specific positions, can give massive diversity. In contrast, an oligo pool of user-defined sequences allows precise control of the library composition, but with less diversity given the synthesis limit of an array. It is also important to understand that the current length limits for oligo synthesis are ~150–200nt generally, though advances will likely allow longer sequences to be synthesized in the near future (it is possible currently to combine multiple smaller oligo libraries by standard approaches). Each of these options has its own caveats, with a general concern being deviation of the library from the expected composition. For randomized libraries, proper evaluation of diversity is critical, as randomization in the oligo synthesis process may not be truly random. It is critical to consult with the vendor on how synthesis substrates are mixed during synthesis. For any given randomized position, different deoxynucleotides can be either mixed prior to the synthesis reaction by hand or quickly but sequentially injected during ongoing synthesis on the machine. The latter approach, although injection of substrates is fast, will still lead to bias towards the substrates injected earlier. Proper assessment of randomness is recommended in cases where ultra-high diversity is critical. A shallow sequencing such as Illumina Miseq Nano (1~2 million reads) can be used to provisionally assess diversity at the base composition level in the randomized region.

For the programmed libraries of defined sequences, error rate is of importance. Given the nature of oligo synthesis, errors will increase as library length increases. For example, for a 200-bp library, an error rate as low as 0.1% for each base would theoretically cause 18% (1–0.999^200) of oligos to be erroneous. If the error rate is only slightly higher (0.2%) for each base, the majority (67%) of oligos will have errors. The fraction of erroneous oligos may have a negative impact depending on experiments, the diversity needed, and the depth of library sampling and sequencing employed. Tools for error corrections on synthesized oligo pools are available [77]. For deep mutational scanning experiments, a common synthesis error, insertion or deletion, will cause frameshifts in coding regions. One approach to minimize insertion or deletion errors is to first clone synthesized oligo pools in-frame with a selectable marker (e.g. antibiotic resistance in bacteria- see here [71, 78, 79]). Frameshift mutations will be selected against in such an approach, but it is important to sample libraries deeply enough at such a cloning step so that library diversity is maintained. Attention to depth of library sampling at any potential bottleneck is important to maintain library diversity.

Programmed or randomized libraries are generally designed with constant flanking sequence for standard cloning. When designing libraries with defined sequences, multiple libraries can be flanked by distinct priming sequences so that they can be synthesized in one pool but separately amplified by PCR. One should keep in mind that the distinct flanking sequences need to be orthogonal to avoid cross-contamination when amplifying libraries. Experimentally validated orthogonal primers are recommended for this purpose [8082].

2.2. Working with libraries

Libraries can be cloned using standard approaches with a few critical parameters to keep in mind. First, enzymes for amplifying libraries should have high processivity and fidelity. Potential biases for any enzyme should be understood, potentially controlled for and minimized (e.g. ligation or PCR amplification bias [83, 84]). Second, the starting template concentration and number of thermocycles for the PCR reactions should be carefully controlled, as over-amplification alters the library diversity. Although empirically small number of cycles (4–6 cycles) on highly concentrated starting temperate library DNA (100–300ng of short oligo library pool) is one approach, several careful controls can be added to minimize the perturbation of the library diversity. It is highly recommended to titrate the template DNA concentration for the PCR reactions, because overloading template may inhibit PCR and lead to undesired products (e.g. DNA template switching, discussed below). Fluorescence-based quantitative PCR is recommended to estimate the number of cycles required for titrated library template concentration. Quality controls, such as high-sensitivity DNA gel electrophoresis to check library size distributions and shallow sequencing to confirm library composition, are recommended after library amplification.

Third, PCR fidelity of libraries with shared sequence regions (libraries containing short randomized regions with common flanking regions, or protein-coding variant libraries) is commonly compromised by template switching during thermocycling [8594]. When different templates share homology, some templates may be partially extended during one cycle, and the resulting short product may serve as primer for other templates during additional cycles. The resulting products can be chimeras combining two distinct templates through shared priming sequence. Computational strategies are available to estimate and decrease undesired template switching artifacts to some extent [85, 90, 93]. In addition, some basic experimental parameters can minimize PCR chimeras, such as increasing extension time to minimize partially extended products or to decrease the number of thermocycles [89]. Water-in-oil emulsion PCR, where individual template DNAs are separated into reaction droplets separated by an emulsion can minimize cross-template switching and enhance evenness of amplification across the library, as GC content, secondary structure and experimental conditions could affect the success of DNA amplification [8688, 91, 94]. Emulsion PCR should be a consideration if library distribution and integrity are of paramount importance, especially in later amplification steps where variants are amplified subsequent to selection(s). While setting up a protocol to handle sequencing libraries for the first time in lab, it is highly recommended to sequence libraries to evaluate diversity prior to and post amplification.

2.3. Direct variant identification by deep sequencing

Variant identification and coupling to an index or barcode are generally of primary importance in high-throughput experiments, as a very large number of variants are pooled and require quantitative measurement. One way to identify or quantify sequence variants is to directly sequence the variant region by deep sequencing, meaning the variant itself is counted in the experiment. This is a direct readout of the variant abundance with minimal perturbation. However, if there are many similar variants, noise in quantification will be introduced by sequencing errors. Such errors can be countered by indexing variants with a separate barcode where barcode diversity is much higher than variant diversity and can incorporate error correction strategies (See “barcode linkage” section below) [9598]. In cases where PCR amplification is necessary, one should amplify the sequencing libraries by following the similar enzymatic parameters described in the “Working with libraries” section above. Again, one should also evaluate the possibility of template switching and decide if emulsion PCR is necessary when preparing sequencing libraries [86, 87, 91, 92].

2.4. Variant linkage to unique molecular barcodes

Introducing unique barcodes to index each variant is essential under some circumstances. First, when the length of variant region is over the limit of most available deep sequencing approaches, linking unique barcodes reduces the complexity in the sequencing strategy and decreases the artifacts from sequencing errors. Second, barcode linkage is necessary if the measured output is disconnected from the variant region. For example, in the MPRA and MASTER experiments discussed above, the final readout is the level of transcribed RNA, which does not contain the variant region. Therefore, unique barcodes in the transcribed region are necessary for identifying the variants.

Barcoding strategies have two general technical caveats. First, barcode sequences may not be inert. For example, barcodes in the transcribed region may have effects on expression or stability, adding an additional layer of complexity in MPRA experiments. Second, a barcode linking to more than one variant can no longer be used to uniquely identify the variants. A general approach to address both issues is to have a much larger pool of barcodes than the size of the variant library, such that each variant is indexed with multiple unique barcodes and the chance of one barcode linking to multiple variants is minimized. Averaging effects from multiple barcodes thus can reduce barcode-dependent bias.

When properly indexed and assembled, barcodes can be assigned to each variant by various sequencing strategies. In cases where both variant region and barcode are small, simple Illumina pair-end sequencing can cover variants and barcodes with two reads. For variants over most Illumina sequencing length limits, long-read sequencing (PacBio [99] or Nanopore [100102]) or haplotype-resolving approaches [103] can be used to link the variants and their unique barcodes, however long read approaches may require deeper coverage to come to consensus due to higher error rates. After barcodes are properly assigned to the variants, barcodes can be amplified and sequenced to decode the variant abundance in later experiments. As a reminder, you will want to minimize or be aware of processes that can separate assigned barcodes from their originally-linked variants.

2.5. Selections in deep mutational scanning experiments

In a deep mutational scanning experiment, oligonucleotides encoding a large number of defined mutants are pooled, cloned and introduced into cells for selection experiments. Given the power of deep sequencing to quantify frequency of each mutant allele in a pool, functional consequences of variants can be assayed by quantifying changes in allele frequency under various experimental conditions [5, 4576, 78, 79]. Experimental conditions can be fitness-based selections, where variants with fitness benefit or deficits can be quantified because variant function is linked to replicative capacity of the host cell [61, 62, 64, 75]. Alternatively, experimental conditions might simply relate to the expression level of any particular gene [47, 49, 68, 72]. For example, variants affecting abundance of a target protein can be selected by the target protein FACS binning, and variants increasing or decreasing the target protein abundance are enriched in different bins [72]. Expression or fitness selection can also be coupled to genetic reporter based assays such as bacterial two-hybrid [48, 73], phage and yeast display [45, 51, 52, 104107] etc to assay various basic functional aspects of proteins.

3. Discussion

3.1. What can we learn from saturating mutagenesis?

Structure/function relationships are complicated by the enormity of sequence space. In proteins, amino acid residues are the basic building blocks that collectively contribute to protein function. Functional deconvolution of protein structures heavily relies on mutational approaches. Individual mutational approaches such as traditional alanine-scanning have revealed much, but are limited by the diversity of the substituted variants [108]. In addition, perturbing one or limited number of amino acids at a time often encounters complexities from functional interactions among the perturbed residues and the background residues. The non-linear nature of phenotypes and epistasis makes quantitative prediction of untested epistasis from a subset of total mutation combinations difficult [109112]. Therefore, care must be taken in interpretation and prediction from incomplete networks. Deep mutational scanning allows sampling of a much larger number of mutations across the sequence space, allowing exploration of many questions that were previously difficult to test. Examples are discussed below.

Saturating mutagenesis for single substitutions of a protein or a critical protein domain often leads to global insights missing from examination of a limited number of variants. In general, any particular mutation represents the combination of two effects: loss of the substituted residue side chain and gain of the substituting residue side chain, and follow-on effects of each of these may be dependent on environmental conditions or protein background. Assaying more than one substitution at a particular condition is almost always necessary to properly interpret mutational effects, and saturating mutagenesis adds the further benefit of evaluating the effects from different side-chain classes (hydrophobic, polar, charged etc) at the same position. Each added side-chain could potentially impact the protein structure in one or many aspects, including but not limited to, residue-residue interactions, propensity for local secondary structure, or biochemical character (hydrophobicity or other) of the nearby environments. It is thus not entirely surprising from many existing saturating mutagenesis studies that different substitutions in most, if not all, of the same amino acid residue lead to diverse effects [5, 4576, 78, 79] (reviewed in [108]).

3.2. Evolution/epistasis- how do interaction networks drive protein function?

The scale of deep mutational scanning has also made it possible to test fundamental questions of residue functional interaction networks and epistasis among residues that control function and functional plasticity of proteins. Essentially, these are questions of protein function intimately linked to molecular evolution. Interactions between residues and residue functional dependence on identities or behavior of other residues underlie observations of epistasis- that is, the departure from additivity of phenotypes upon combinations of substitutions. Epistasis, or co-evolution among residues, can illuminate relationships underlying a protein’s function and can be detected by statistical coupling analysis (SCA) or other multiple-sequence alignment-based approaches [113116]. These approaches predict functional epistasis among co-evolved residues, i.e. the identities of side chains at specific positions in a protein are constrained because they function together or are mutually exclusive of other residue combinations. Extensive work from the Ranganathan lab has demonstrated that co-evolving residues can form a physically continuous network (termed “sectors”) that contribute to protein folding, stability, catalysis, substrate specificity and allostery [48, 113, 114, 116121]. The statistical coupling observed across evolution has only recently been comprehensively experimentally tested due to recent advances in deep mutational scanning. Salinas and Ranganathan recently derived the deep mutational scanning approach to measure many thousands of pairwise couplings in several PDZ domain protein homologs and largely recapitulated the functionally constrained sectors residues from co-evolutionary analyses [122], providing the first comprehensive experimental evidence for epistasis among co-evolving residues.

Given the high functional constraints among residue co-evolution, a relevant question is to ask whether the evolved enzymes represent the only and optimal trajectory from their ancestors. Recent work from the Thornton lab has explicitly tested alternative evolutionary trajectories other than the statistically inferred evolutionary path from a phylogenetically reconstructed ancestral transcription factor [76]. The scale of deep mutational scanning has led to the amazing discovery that hundreds of alternative protein sequences could have performed similar derived enzymatic function using diverse biochemical mechanisms [76]. They have also found that all these alternatives require diverse prior neutral permissive mutations, consistent with the hypothesis that epistasis between residues affects enzymatic evolvability to drive molecular evolution(for reviews, see [123, 124]), along with other experimental data [61, 76, 121].

Deep mutational scanning experiments have also been employed to study other important “sector” functions and origin of allostery. In a case study, saturation mutagenesis has shown that sector residues, although more sensitive to perturbations than non-sector positions, provide a physical basis for the functional adaptation to a new ligand for a PDZ domain protein [48]. In another case study on TEM-1 ß-lactamase, a protein that confers ampicillin resistance to bacteria, Stiffler et al profiled nearly all possible single substituting mutants under different ampicillin doses, and found that tolerance to mutations in this enzyme largely depends on the strength of purifying selection [61, 65]. Interestingly, mutations that are neutral under low selection pressure but deleterious under high pressure were critical for the enzyme to evolve to a new function (resistance to cefotaxime). Follow-up structural experiments have suggested that some of these “conditionally neutral” mutations affect evolvability by allosterically impacting active site conformational plasticity [121]. This series of studies provides evidence that evolution of novel enzymatic activities can depend on peripheral context mutations through allosteric communication, leading to the proposal that an original function of allostery may be for shaping enzymatic plasticity and evolvability, instead of promoting function [61, 65, 121].

3.3. Mechanistic insights from deep mutational scanning experiments: Yeast Pol II as an example

In this section, we will use yeast Pol II as an example to discuss basics of residue functions and epistasis, highlighting the types of observations that can be made by mutational scanning. Pol II has a conserved active site domain called the trigger loop (TL) that balances Pol II transcriptional activity and fidelity through conformational cycles of opening and closing [75, 125145]. Interesting questions critical for understanding RNA polymerase mechanisms arise from study of the TL, while the nature of the TL’s function affects how mutations might be interpreted. First, the TL can fold into a number of conformations [125130, 134, 146152]; therefore, any individual substitution can have effects on multiple conformations and phenotypes will be the sum of these effects across conformations. Second, the TL is highly conserved across kingdoms of life yet roles for the diversity in sequence that is observed among species are not well understood, nor is the extent of the conservation of function for conserved residues known. The TL swings between an open, catalytically-disfavoring state and a closed, substrate-activated, catalytically-favoring state during nucleotide addition cycles, and this conformational cycling is impacted by intra-TL residue-residue interactions and TL interactions with nearby domains in addition to substrate interactions [75, 125129, 132, 134, 136138]. Initial mutational studies for Pol II mutants in yeast coupled to phenotyping revealed two major TL mutant classes: one class predicted to impair TL interactions with nucleotide substrates, leading to decreased catalytic efficiency (LOF); the other class putatively altering the TL towards the closed, active state, correlating with increased catalytic efficiency but decreased fidelity in vitro (gain-of-function (GOF)). A series of genetic reporter phenotypes well-correlate with altered transcription elongation rates in vitro and specific transcription defects in vivo [75, 127, 132, 153], thus allowing us to predict the biochemical behavior (GOF or LOF) from genetic assays.

Coupling genetic assays with mutational scanning, we profiled almost all possible single substitution TL variants in Saccharomyces cerevisiae Pol II (Sce Pol II)[75]. Our saturation mutagenesis revealed distinct distributions of many GOF and LOF mutants over the entire TL region (Pol II Rpb1 1076–1106, Sce Pol II Rpb1 numbering unless indicated otherwise)[75]. First, some ultra-conserved positions tolerate many substitutions (substitutions are viable), while others tolerate almost none, indicating the wide range of importance for residues equally conserved across evolution. Second, as a caution for the interpretation of function from limited numbers of substitutions, we will discuss the residue pair Rpb1 T1095 and E1103. E1103G was originally identified in a genetic screen from the Strathern lab and shown to increase Pol II elongation rate [153]. Crystal structures indicated potential contacts between E1103 and T1095 when the TL is open and inactive. This led to a model that 1095–1103 interaction supports the inactive TL conformation and its loss promotes TL closing, making the enzyme hyperactive. This was originally tested by showing that T1095G phenocopied E1103G [128]. However, specific additional mutations (e.g. T1095A) and saturating mutagenesis indicate that this contact is not likely critical for the open TL conformation [132]. Almost all E1103 substitutions are genetically GOF, save proline, while almost none of T1095 substitutions are [75]. These results suggest that substitution of T1095 requires residues that gain a new or unique characteristic to promote hyperactivity, while loss of glutamate at 1103 causes loss of a negative role in catalysis for that residue. Third, saturating mutagenesis revealed new, unexpectedly viable substitutions in residues difficult to substitute. One such mutant, H1085L, has negligible growth defects under standard conditions, arguing against previous proposals that the ultra-conserved histidine at 1085 functions as a general acid during catalysis as leucine of course does not function for proton donation [125, 129, 154]. The function of H1085L remains to be determined in biochemical experiments, but its phenotype suggests that any potential function of H1085 as a general acid may be entirely bypassed, calling into question the role of this histidine during catalysis. In addition, were H1085 to act as a general acid, substitutions at H1085 would be expected to alter the Pol II enzymatic pH profile. In line with our genetic data, altered pH dependence of H1085 variants appeared to be complicated and controversial [125, 130, 131, 135, 154]. Finally, a recent report reconciles the genetic and biochemical evidence, and proposes that H1085 acts as a positional catalyst [139], explaining how the apparently appropriately shaped and sized leucine can uniquely substitute for histidine at 1085. Fourth, the tolerance of conserved residues to substitutions in different species indicates the prevalence of epistasis. As an example, eukaryotic multi-subunit polymerases mostly always have a leucine at analogous positions to 1081 [155]. In contrast, E.coli and many other bacteria have a methionine in this position, however leucine is also widely observed [156, 157]. This residue is predicted to stack against the substrate base moiety and essentially test for base-pairing of substrates positioned for catalysis [125, 126]. Substitution of leucine for the methionine in Thermus aquaticus or E.coli RNAPs is well-tolerated[130, 158], as would be predicted from the otherwise high conservation of surrounding residues and the appearance of both leucine and methionine in bacterial MSAs[157]. However, in yeast Pol II, methionine which would be predicted to serve the same function as leucine in the context of the highly conserved TL is very poorly tolerated at position 1081 [75]. This result is interpreted as other residues within the protein constraining protein function such that leucine is now uniquely favored in the yeast (or possibly eukaryotic) lineage – that is to say an example of epistasis that might not have otherwise been predicted from examination of related or conserved systems.

Protein residues work together to support protein function. Given this simple idea, it is important to realize that mutant phenotypes also result from a mutant residue working in the context of a network of interactions in a protein. This means that just as WT residues work within a network, mutated residues also work within a network, and network function itself can be fundamentally altered due to mutation. Combinatorial mutant analysis can reveal functional distinctions between residues with similar single mutant phenotypes[75, 132]. Furthermore, the distinct reliance on additional residue identity that individual substitutions require for their observed phenotypes can be determined. In our original attempts to combine tens of TL mutants by standard approaches, we found that TL mutants conferred activity-dependent genetic interactions: GOF and LOF mutants were mutually suppressive, while combinations of mutants of the same class (GOF+GOF or LOF+LOF) were generally additive (phenotypic exacerbation or synthetic lethality)[132] (Figure 2). The activity-dependent additive genetic interactions suggested that most of the tested TL mutants independently altered Pol II function. However, we observed non-additive, epistatic interactions among viable substitutions in residues N479,Q1078 and N1082, where single mutants all conferred LOF, yet combinations did not exacerbate (Figure 2), suggesting functional dependence of these three residues [132]. This observation highlights the non-independent, epistatic interactions among a small subset of TL residues, and reveals a complex functional network of residues within the Pol II active site. Finally, we observed GOF behavior in substitutions that appeared to require additional residues to be present (i.e. sign epistasis). For example, F1084I has a GOF phenotype on its own, and behaves as GOF in combination with other GOF (exacerbates their phenotypes) or specific LOF substitutions (combination with F1084I suppresses the LOF phenotype). This is not the case when F1084I is combined with a LOF H1085 substitution[132] (Figure 2). Instead of the expected suppression under an additive model, this combination is lethal. This observation is consistent with a model that F1084I GOF activity works through the histidine at 1085, and when this histidine is substituted, F1084I is converted to a LOF substitution. Our original observations were possible because we chose to perform combinatorial mutant analysis and we happened to combine an informative set of substitutions.

Figure 2. Examples of diverse genetic interactions among Pol II trigger loop residues.

Figure 2.

TL substitutions are shown as color-coded circles for known attributes (blue=biochemical/genetic LOF; green=biochemical and genetic GOF). The genetic interactions between them are shown as connecting lines. A yellow line indicates suppression is observed in a particular double mutant. A black line indicates double mutant lethality. A gray dash line indicates epistasis in the particular double mutant. F1084I shows predicted behavior with LOF N1082S (suppression) and with GOF E1103G (lethality) but unexpected behavior with LOF H1085Y (lethality instead of suppression).

From our experience above with a limited subset of double mutant combinations it should be clear how a systematic examination of mutant combinations would be of great value in understanding residue interactions and protein function. Theoretically, comprehensive dissection of all possible double mutants is far from technically feasible for most proteins. For example, a small domain such as Pol II TL (Rpb1 1076–1106) includes about 31 amino acids, which could be mutated to 31 × 20 (19 other amino acids plus residue deletion) = 620 possible single substituting variants and (31 × 30/2) × 20 × 20 = 186,000 possible double-substitution variants. Phenotyping such number of variants is feasible in some systems [73], but not easy in yeast without massive scaleup due to limitations in transformation efficiency. A judicious set of double mutants, based on a curated set of single substitution mutants showing phenotypes of covering a range of biochemical properties will likely be more feasible yet still effective for decoding a complex functional network such as the Pol II TL.

Previous genetic analysis of the TL coupled with evolutionary analysis has revealed context dependence for TL function, despite high TL conservation among multi-subunit RNA polymerases. A prominent example is the surprising discovery that identical substitutions in a conserved residue in Pol II (E1103G) and in yeast Pol I (E1224G) have opposite effects on activity in their respective enzymes. Pol II E1103G is hyperactive as noted above. In fact, RNA polymerases across all kingdoms of life can have hyperactivity conferred by mutations in this region of the enzyme [127, 128, 133, 140, 142, 153, 159162]. However, Pol I E1224G confers a reduced elongation rate [155]. Complicating interpretation, single molecule analyses on yeast Pol II E1103G suggest positive effects on catalysis in this mutant but negative effects on translocation [133]. Therefore, it is possible to reconcile distinct observations for Pol II and Pol I versions of this substitution, if the substitution is interpreted to have differential effects on translocation and catalysis in Pol I than in Pol II. This model is consistent with either catalysis or translocation as potentially limiting during elongation [133, 142]. A further prediction of this model would be that if additional mutations in Pol I rendered catalysis limiting, positive effects of E1224G should become apparent (i.e. an example of sign epistasis, where the function of one mutation is dependent on the background in which it is examined). This is exactly what Viktorovskaya et al observed in [155]. Furthermore, the highly conserved Sce RNA polymerase I (Pol I) TL is greatly impaired when substituted into Pol II [155], while the Sce RNA polymerase III (Pol III) TL is compatible. Examining single-mutant substitution phenotypes in Pol II, we inferred that the observed Pol I incompatibility likely stems from substitutions in the TL tip region, which normally support TL interactions with nearby funnel helices [75], and these interactions are indeed shown to be divergent by comparison of Pol I and Pol II crystal structures [75, 149, 163]. These observations highlight the importance of protein context for even highly conserved protein domains such as the TL and are consistent with studies illustrating that epistasis shapes enzymatic evolvability through allosterically impacting structural plasticity of the conserved active site. Using the large-scale genetic system for Pol II that we have established, we can survey the functional compatibility of all available evolutionary TL variants (hundreds) in the Pol II context. Using Pol I and Pol III TLs as a case study, all possible evolutionary intermediates from Pol I/III to Pol II TL can be examined in the Pol II context. Such an experiment can reveal how much potential epistasis for TL variants arises within the TL, where compensatory or other substitutions allow tolerance of individual incompatible variants, or the epistasis that would be predicted to be present in sequences outside the TL.

The context dependence of TL function points to the critical importance of studying functional interactions between TL and the nearby domains. How mutations in TL-proximal domains or from elsewhere in Pol II alter the TL phenotypic landscape (tolerance or sensitivity to TL substitution) can provide evidence for functional communication between different parts of the enzyme that may not be readily apparent from examination of structural data. External mutations likely shape the TL phenotypic landscape in specific ways, where both activity-dependent additive genetic interactions and specific inter-domain epistasis will be revealed. Distribution of the epistatic interactions will be informative for understanding the molecular evolution of the highly conserved TL domain in the Pol II enzymatic context. After all, high-throughput genetics assays are ready for testing each of the many questions above.

4. Conclusions

Pol II is a large protein complex essential mRNA transcription, an intensely regulated step in gene expression. We have summarized our work towards understanding one critical domain that functions at the heart of Pol II catalysis in the context of how deep sequencing based high-throughput assays allow tremendous insights into biological questions in general. We have highlighted insights into protein function that can be gleaned from deep mutational scanning experiments while striving to illuminate the possibility of applying these types of experiments to any system of interest. Essentially, high-throughput functional assays are applicable to many biological systems where massive oligo variants and deep sequencing can be in play.

Highlights.

Variant synthesis coupled with deep sequencing allows high-throughput functional assays

Deep mutational scanning is a high-throughput assay to study structure/function

Deep mutational scanning allows tests of fundamental molecular evolution questions

Deep mutational scanning reveals mechanistic insights into RNA Polymerase II

High-throughput functional assays are applicable RNA-centered biology

Acknowledgements

We are grateful to the US National Institutes of Health grant NIH R01GM097260 to and to Welch Foundation grant A-1763 to CK for funding this work.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References:

  • [1].Ghindilis AL, Smith MW, Schwarzkopf KR, Roth KM, Peyvan K, Munro SB, Lodes MJ, Stover AG, Bernards K, Dill K, McShea A, CombiMatrix oligonucleotide arrays: genotyping and gene expression assays employing electrochemical detection, Biosens Bioelectron, 22 (2007) 1853–1860. [DOI] [PubMed] [Google Scholar]
  • [2].Kosuri S, Church GM, Large-scale de novo DNA synthesis: technologies and applications, Nat Methods, 11 (2014) 499–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].LeProust EM, Peck BJ, Spirin K, McCuen HB, Moore B, Namsaraev E, Caruthers MH, Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process, Nucleic Acids Res, 38 (2010) 2522–2540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Castaldi PJ, Guo F, Qiao D, Du F, Naing ZZC, Li Y, Pham B, Mikkelsen TS, Cho MH, Silverman EK, Zhou X, Identification of Functional Variants in the FAM13A Chronic Obstructive Pulmonary Disease Genome-Wide Association Study Locus by Massively Parallel Reporter Assays, Am J Respir Crit Care Med, 199 (2019) 52–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Cuperus JT, Groves B, Kuchina A, Rosenberg AB, Jojic N, Fields S, Seelig G, Deep learning of the regulatory grammar of yeast 5’ untranslated regions from 500,000 random sequences, Genome Res, 27 (2017) 2015–2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Dailey L, High throughput technologies for the functional discovery of mammalian enhancers: new approaches for understanding transcriptional regulatory network dynamics, Genomics, 106 (2015) 151–158. [DOI] [PubMed] [Google Scholar]
  • [7].Ernst J, Melnikov A, Zhang X, Wang L, Rogov P, Mikkelsen TS, Kellis M, Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions, Nat Biotechnol, 34 (2016) 1180–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Grossman SR, Zhang X, Wang L, Engreitz J, Melnikov A, Rogov P, Tewhey R, Isakova A, Deplancke B, Bernstein BE, Mikkelsen TS, Lander ES, Systematic dissection of genomic features determining transcription factor binding and enhancer function, Proc Natl Acad Sci U S A, 114 (2017) E1291–E1300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Inoue F, Ahituv N, Decoding enhancers using massively parallel reporter assays, Genomics, 106 (2015) 159–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Kheradpour P, Ernst J, Melnikov A, Rogov P, Wang L, Zhang X, Alston J, Mikkelsen TS, Kellis M, Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay, Genome Res, 23 (2013) 800–811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Levo M, Avnit-Sagi T, Lotan-Pompan M, Kalma Y, Weinberger A, Yakhini Z, Segal E, Systematic Investigation of Transcription Factor Activity in the Context of Chromatin Using Massively Parallel Binding and Expression Assays, Mol Cell, 65 (2017) 604–617 e606. [DOI] [PubMed] [Google Scholar]
  • [12].Maricque BB, Dougherty JD, Cohen BA, A genome-integrated massively parallel reporter assay reveals DNA sequence determinants of cis-regulatory activity in neural cells, Nucleic Acids Res, 45 (2017) e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Melnikov A, Murugan A, Zhang X, Tesileanu T, Wang L, Rogov P, Feizi S, Gnirke A, Callan CG Jr., Kinney JB, Kellis M, Lander ES, Mikkelsen TS, Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay, Nat Biotechnol, 30 (2012) 271–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Melnikov A, Zhang X, Rogov P, Wang L, Mikkelsen TS, Massively parallel reporter assays in cultured mammalian cells, J Vis Exp, (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Mogno I, Kwasnieski JC, Cohen BA, Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants, Genome Res, 23 (2013) 1908–1915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, Lee C, Andrie JM, Lee SI, Cooper GM, Ahituv N, Pennacchio LA, Shendure J, Massively parallel functional dissection of mammalian enhancers in vivo, Nat Biotechnol, 30 (2012) 265–270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Patwardhan RP, Lee C, Litvin O, Young DL, Pe’er D, Shendure J, High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis, Nat Biotechnol, 27 (2009) 1173–1175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Rabani M, Pieper L, Chew GL, Schier AF, A Massively Parallel Reporter Assay of 3’ UTR Sequences Identifies In Vivo Rules for mRNA Degradation, Mol Cell, 68 (2017) 1083–1094 e1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Rich MS, Payen C, Rubin AF, Ong GT, Sanchez MR, Yachie N, Dunham MJ, Fields S, Comprehensive Analysis of the SUL1 Promoter of Saccharomyces cerevisiae, Genetics, 203 (2016) 191–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Rosenberg AB, Patwardhan RP, Shendure J, Seelig G, Learning the sequence determinants of alternative splicing from millions of random sequences, Cell, 163 (2015) 698–711. [DOI] [PubMed] [Google Scholar]
  • [21].Rosenberg AB, Roco CM, Muscat RA, Kuchina A, Sample P, Yao Z, Graybuck LT, Peeler DJ, Mukherjee S, Chen W, Pun SH, Sellers DL, Tasic B, Seelig G, Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding, Science, 360 (2018) 176–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Sample PJ, Wang B, Reid DW, Presnyak V, McFadyen I, Morris DR, Seelig G, Human 5′ UTR design and variant effect prediction from a massively parallel translation assay, bioRxiv, (2018) 310375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Sharon E, van Dijk D, Kalma Y, Keren L, Manor O, Yakhini Z, Segal E, Probing the effect of promoters on noise in gene expression using thousands of designed sequences, Genome Res, 24 (2014) 1698–1706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Shen SQ, Myers CA, Hughes AE, Byrne LC, Flannery JG, Corbo JC, Massively parallel cis-regulatory analysis in the mammalian central nervous system, Genome Res, 26 (2016) 238–255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Smith RP, Taher L, Patwardhan RP, Kim MJ, Inoue F, Shendure J, Ovcharenko I, Ahituv N, Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model, Nat Genet, 45 (2013) 1021–1028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Tewhey R, Kotliar D, Park DS, Liu B, Winnicki S, Reilly SK, Andersen KG, Mikkelsen TS, Lander ES, Schaffner SF, Sabeti PC, Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay, Cell, 165 (2016) 1519–1529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Ulirsch JC, Nandakumar SK, Wang L, Giani FC, Zhang X, Rogov P, Melnikov A, McDonel P, Do R, Mikkelsen TS, Sankaran VG, Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits, Cell, 165 (2016) 1530–1545. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].White MA, Understanding how cis-regulatory function is encoded in DNA sequence using massively parallel reporter assays and designed sequences, Genomics, 106 (2015) 165–170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].White MA, Myers CA, Corbo JC, Cohen BA, Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks, Proc Natl Acad Sci U S A, 110 (2013) 11952–11957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Zhao W, Pollack JL, Blagev DP, Zaitlen N, McManus MT, Erle DJ, Massively parallel functional annotation of 3’ untranslated regions, Nat Biotechnol, 32 (2014) 387–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Vvedenskaya IO, Zhang Y, Goldman SR, Valenti A, Visone V, Taylor DM, Ebright RH, Nickels BE, Massively Systematic Transcript End Readout, “MASTER”: Transcription Start Site Selection, Transcriptional Slippage, and Transcript Yields, Mol Cell, 60 (2015) 953–965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Vvedenskaya IO, Vahedian-Movahed H, Zhang Y, Taylor DM, Ebright RH, Nickels BE, Interactions between RNA polymerase and the core recognition element are a determinant of transcription start site selection, Proc Natl Acad Sci U S A, 113 (2016) E2899–2905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Winkelman JT, Vvedenskaya IO, Zhang Y, Zhang Y, Bird JG, Taylor DM, Gourse RL, Ebright RH, Nickels BE, Multiplexed protein-DNA cross-linking: Scrunching in transcription start site selection, Science, 351 (2016) 1090–1093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Vvedenskaya IO, Goldman SR, Nickels BE, Analysis of Bacterial Transcription by “Massively Systematic Transcript End Readout,” MASTER, Methods Enzymol, 612 (2018) 269–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Cheung R, Insigne KD, Yao D, Burghard CP, Wang J, Hsiao Y-HE, Jones EM, Goodman DB, Xiao X, Kosuri S, A Multiplexed Assay for Exon Recognition Reveals that an Unappreciated Fraction of Rare Genetic Variants Cause Large-Effect Splicing Disruptions, Molecular cell, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Wong MS, Kinney JB, Krainer AR, Quantitative Activity Profile and Context Dependence of All Human 5’ Splice Sites, Mol Cell, 71 (2018) 1012–1026 e1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Oikonomou P, Goodarzi H, Tavazoie S, Systematic identification of regulatory elements in conserved 3’ UTRs of human transcripts, Cell Rep, 7 (2014) 281–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Shalem O, Sharon E, Lubliner S, Regev I, Lotan-Pompan M, Yakhini Z, Segal E, Systematic dissection of the sequence determinants of gene 3’ end mediated expression control, PLoS Genet, 11 (2015) e1005147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Yartseva V, Takacs CM, Vejnar CE, Lee MT, Giraldez AJ, RESA identifies mRNA-regulatory sequences at high resolution, Nat Methods, 14 (2017) 201–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Vainberg Slutskin I, Weingarten-Gabbay S, Nir R, Weinberger A, Segal E, Unraveling the determinants of microRNA mediated regulation using a massively parallel reporter assay, Nat Commun, 9 (2018) 529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Kosuri S, Goodman DB, Cambray G, Mutalik VK, Gao Y, Arkin AP, Endy D, Church GM, Composability of regulatory sequences controlling transcription and translation in Escherichia coli, Proc Natl Acad Sci U S A, 110 (2013) 14024–14029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Noderer WL, Flockhart RJ, Bhaduri A, Diaz de Arce AJ, Zhang J, Khavari PA, Wang CL, Quantitative analysis of mammalian translation initiation sites by FACS-seq, Mol Syst Biol, 10 (2014) 748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Gritsenko AA, Weingarten-Gabbay S, Elias-Kirma S, Nir R, de Ridder D, Segal E, Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity, PLoS Comput Biol, 13 (2017) e1005734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Michaels YS, Barnkob MB, Barbosa H, Baeumler TA, Thompson MK, Andre V, Colin-York H, Fritzsche M, Gileadi U, Sheppard HM, Precise tuning of gene expression output levels in mammalian cells, bioRxiv, (2018) 352377. [Google Scholar]
  • [45].Fowler DM, Araya CL, Fleishman SJ, Kellogg EH, Stephany JJ, Baker D, Fields S, High-resolution mapping of protein sequence-function relationships, Nat Methods, 7 (2010) 741–746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Araya CL, Fowler DM, Deep mutational scanning: assessing protein function on a massive scale, Trends Biotechnol, 29 (2011) 435–442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Araya CL, Fowler DM, Chen W, Muniez I, Kelly JW, Fields S, A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function, Proc Natl Acad Sci U S A, 109 (2012) 16858–16863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [48].McLaughlin RN Jr., Poelwijk FJ, Raman A, Gosal WS, Ranganathan R, The spatial architecture of protein function and adaptation, Nature, 491 (2012) 138–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [49].Kim I, Miller CR, Young DL, Fields S, High-throughput analysis of in vivo protein stability, Mol Cell Proteomics, 12 (2013) 3370–3378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Melamed D, Young DL, Gamble CE, Miller CR, Fields S, Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein, RNA, 19 (2013) 1537–1551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [51].Starita LM, Pruneda JN, Lo RS, Fowler DM, Kim HJ, Hiatt JB, Shendure J, Brzovic PS, Fields S, Klevit RE, Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis, Proc Natl Acad Sci U S A, 110 (2013) E1263–1272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Tinberg CE, Khare SD, Dou J, Doyle L, Nelson JW, Schena A, Jankowski W, Kalodimos CG, Johnsson K, Stoddard BL, Baker D, Computational design of ligand-binding proteins with high affinity and selectivity, Nature, 501 (2013) 212–216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [53].Fowler DM, Fields S, Deep mutational scanning: a new style of protein science, Nat Methods, 11 (2014) 801–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [54].Fowler DM, Stephany JJ, Fields S, Measuring the activity of protein variants on a large scale using deep mutational scanning, Nat Protoc, 9 (2014) 2267–2284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [55].Guy MP, Young DL, Payea MJ, Zhang X, Kon Y, Dean KM, Grayhack EJ, Mathews DH, Fields S, Phizicky EM, Identification of the determinants of tRNA function and susceptibility to rapid tRNA decay by high-throughput in vivo analysis, Genes Dev, 28 (2014) 1721–1732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [56].Raman S, Taylor N, Genuth N, Fields S, Church GM, Engineering allostery, Trends Genet, 30 (2014) 521–528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Feng J, Jester BW, Tinberg CE, Mandell DJ, Antunes MS, Chari R, Morey KJ, Rios X, Medford JI, Church GM, Fields S, Baker D, A general strategy to construct small molecule biosensors in eukaryotes, Elife, 4 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [58].Kitzman JO, Starita LM, Lo RS, Fields S, Shendure J, Massively parallel single-amino-acid mutagenesis, Nat Methods, 12 (2015) 203–206, 204 p following 206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [59].Melamed D, Young DL, Miller CR, Fields S, Combining natural sequence variation with high throughput mutational data to reveal protein interaction sites, PLoS Genet, 11 (2015) e1004918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [60].Starita LM, Young DL, Islam M, Kitzman JO, Gullingsrud J, Hause RJ, Fowler DM, Parvin JD, Shendure J, Fields S, Massively Parallel Functional Analysis of BRCA1 RING Domain Variants, Genetics, 200 (2015) 413–422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [61].Stiffler MA, Hekstra DR, Ranganathan R, Evolvability as a function of purifying selection in TEM-1 beta-lactamase, Cell, 160 (2015) 882–892. [DOI] [PubMed] [Google Scholar]
  • [62].Brenan L, Andreev A, Cohen O, Pantel S, Kamburov A, Cacchiarelli D, Persky NS, Zhu C, Bagul M, Goetz EM, Burgin AB, Garraway LA, Getz G, Mikkelsen TS, Piccioni F, Root DE, Johannessen CM, Phenotypic Characterization of a Comprehensive Set of MAPK1/ERK2 Missense Mutants, Cell Rep, 17 (2016) 1171–1183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [63].Gamble CE, Brule CE, Dean KM, Fields S, Grayhack EJ, Adjacent Codons Act in Concert to Modulate Translation Efficiency in Yeast, Cell, 166 (2016) 679–690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [64].Li C, Qian W, Maclean CJ, Zhang J, The fitness landscape of a tRNA gene, Science, 352 (2016) 837–840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [65].Stiffler MA, Subramanian SK, Salinas VH, Ranganathan R, A Protocol for Functional Assessment of Whole-Protein Saturation Mutagenesis Libraries Utilizing High-Throughput Sequencing, J Vis Exp, (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [66].Taylor ND, Garruss AS, Moretti R, Chan S, Arbing MA, Cascio D, Rogers JK, Isaacs FJ, Kosuri S, Baker D, Fields S, Church GM, Raman S, Engineering an allosteric transcription factor to respond to new ligands, Nat Methods, 13 (2016) 177–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [67].Bandaru P, Shah NH, Bhattacharyya M, Barton JP, Kondo Y, Cofsky JC, Gee CL, Chakraborty AK, Kortemme T, Ranganathan R, Kuriyan J, Deconstruction of the Ras switching cycle through saturation mutagenesis, Elife, 6 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [68].Bhagavatula G, Rich MS, Young DL, Marin M, Fields S, A Massively Parallel Fluorescence Assay to Characterize the Effects of Synonymous Mutations on TP53 Expression, Mol Cancer Res, 15 (2017) 1301–1307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [69].Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, Carter L, Ravichandran R, Mulligan VK, Chevalier A, Arrowsmith CH, Baker D, Global analysis of protein folding using massively parallel design, synthesis, and testing, Science, 357 (2017) 168–175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [70].Dorrity MW, Cuperus JT, Carlisle JA, Fields S, Queitsch C, Preferences in a trait decision determined by transcription factor variants, Proc Natl Acad Sci U S A, 115 (2018) E7997–E8006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [71].Giacomelli AO, Yang X, Lintner RE, McFarland JM, Duby M, Kim J, Howard TP, Takeda DY, Ly SH, Kim E, Gannon HS, Hurhula B, Sharpe T, Goodale A, Fritchman B, Steelman S, Vazquez F, Tsherniak A, Aguirre AJ, Doench JG, Piccioni F, Roberts CWM, Meyerson M, Getz G, Johannessen CM, Root DE, Hahn WC, Mutational processes shape thelandscape of TP53 mutations in human cancer, Nat Genet, 50 (2018) 1381–1387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [72].Matreyek KA, Starita LM, Stephany JJ, Martin B, Chiasson MA, Gray VE, Kircher M, Khechaduri A, Dines JN, Hause RJ, Bhatia S, Evans WE, Relling MV, Yang W, Shendure J, Fowler DM, Multiplex assessment of protein variant abundance by massively parallel sequencing, Nat Genet, 50 (2018) 874–882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [73].Salinas VH, Ranganathan R, Coevolution-based inference of amino acid interactions underlying protein function, Elife, 7 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [74].Sievers QL, Petzold G, Bunker RD, Renneville A, Slabicki M, Liddicoat BJ, Abdulrahman W, Mikkelsen T, Ebert BL, Thoma NH, Defining the human C2H2 zinc finger degrome targeted by thalidomide analogs through CRBN, Science, 362 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [75].Qiu C, Erinne OC, Dave JM, Cui P, Jin H, Muthukrishnan N, Tang LK, Babu SG, Lam KC, Vandeventer PJ, Strohner R, Van den Brulle J, Sze SH, Kaplan CD, High-Resolution Phenotypic Landscape of the RNA Polymerase II Trigger Loop, PLoS Genet, 12 (2016) e1006321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [76].Starr TN, Picton LK, Thornton JW, Alternative evolutionary histories in the sequence space of an ancient protein, Nature, 549 (2017) 409–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [77].Lubock NB, Zhang D, Sidore AM, Church GM, Kosuri S, A systematic comparison of error correction enzymes by next-generation sequencing, Nucleic Acids Res, 45 (2017) 9206–9217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [78].Melnikov A, Rogov P, Wang L, Gnirke A, Mikkelsen TS, Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes, Nucleic Acids Res, 42 (2014) e112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [79].Majithia AR, Tsuda B, Agostini M, Gnanapradeepan K, Rice R, Peloso G, Patel KA, Zhang X, Broekema MF, Patterson N, Prospective functional classification of all possible missense variants in PPARG, Nature genetics, 48 (2016) 1570. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [80].Subramanian SK, Russ WP, Ranganathan R, A set of experimentally validated, mutually orthogonal primers for combinatorially specifying genetic components, Synthetic Biology, 3 (2018) ysx008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [81].Plesa C, Sidore AM, Lubock NB, Zhang D, Kosuri S, Multiplexed gene synthesis in emulsions for exploring protein functional landscapes, Science, 359 (2018) 343–347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [82].Kosuri S, Eroshenko N, Leproust EM, Super M, Way J, Li JB, Church GM, Scalable gene synthesis by selective amplification of DNA pools from high-fidelity microchips, Nat Biotechnol, 28 (2010) 1295–1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [83].Quail MA, Otto TD, Gu Y, Harris SR, Skelly TF, McQuillan JA, Swerdlow HP, Oyola SO, Optimal enzymes for amplifying sequencing libraries, Nature methods, 9 (2012) 10. [DOI] [PubMed] [Google Scholar]
  • [84].Sorefan K, Pais H, Hall AE, Kozomara A, Griffiths-Jones S, Moulton V, Dalmay T, Reducing ligation bias of small RNAs in libraries for next generation sequencing, Silence, 3 (2012) 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [85].Gonzalez JM, Zimmermann J, Saiz-Jimenez C, Evaluating putative chimeric sequences from PCR-amplified products, Bioinformatics, 21 (2005) 333–337. [DOI] [PubMed] [Google Scholar]
  • [86].Williams R, Peisajovich SG, Miller OJ, Magdassi S, Tawfik DS, Griffiths AD, Amplification of complex gene libraries by emulsion PCR, Nat Methods, 3 (2006) 545–550. [DOI] [PubMed] [Google Scholar]
  • [87].Hori M, Fukano H, Suzuki Y, Uniform amplification of multiple DNAs by emulsion PCR, Biochem Biophys Res Commun, 352 (2007) 323–328. [DOI] [PubMed] [Google Scholar]
  • [88].Tewhey R, Warner JB, Nakano M, Libby B, Medkova M, David PH, Kotsopoulos SK, Samuels ML, Hutchison JB, Larson JW, Topol EJ, Weiner MP, Harismendy O, Olson J, Link DR, Frazer KA, Microdroplet-based PCR enrichment for large-scale targeted sequencing, Nat Biotechnol, 27 (2009) 1025–1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [89].Smyth RP, Schlub TE, Grimm A, Venturi V, Chopra A, Mallal S, Davenport MP, Mak J, Reducing chimera formation during PCR amplification to ensure accurate genotyping, Gene, 469 (2010) 45–51. [DOI] [PubMed] [Google Scholar]
  • [90].Schloss PD, Gevers D, Westcott SL, Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies, PLoS One, 6 (2011) e27310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [91].Schutze T, Rubelt F, Repkow J, Greiner N, Erdmann VA, Lehrach H, Konthur Z, Glokler J, A streamlined protocol for emulsion polymerase chain reaction and subsequent purification, Anal Biochem, 410 (2011) 155–157. [DOI] [PubMed] [Google Scholar]
  • [92].Zhu Z, Jenkins G, Zhang W, Zhang M, Guan Z, Yang CJ, Single-molecule emulsion PCR in microfluidic droplets, Anal Bioanal Chem, 403 (2012) 2127–2143. [DOI] [PubMed] [Google Scholar]
  • [93].Gonzalez JM, Evaluating putative chimeric sequences from PCR-amplified products, Encyclopedia of Metagenomics: Genes, Genomes and Metagenomes: Basics, Methods, Databases and Tools, (2015) 150–155. [Google Scholar]
  • [94].Pawluczyk M, Weiss J, Links MG, Egana Aranguren M, Wilkinson MD, Egea-Cortines M, Quantitative evaluation of bias in PCR amplification and next-generation sequencing derived from metabarcoding samples, Anal Bioanal Chem, 407 (2015) 1841–1848. [DOI] [PubMed] [Google Scholar]
  • [95].Islam S, Zeisel A, Joost S, La Manno G, Zajac P, Kasper M, Lönnerberg P, Linnarsson S, Quantitative single-cell RNA-seq with unique molecular identifiers, Nature methods, 11 (2014) 163. [DOI] [PubMed] [Google Scholar]
  • [96].Zhao L, Liu Z, Levy SF, Wu S, Bartender: a fast and accurate clustering algorithm to count barcode reads, Bioinformatics, 34 (2017) 739–747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [97].Ziegenhain C, Vieth B, Parekh S, Reinius B, Guillaumet-Adkins A, Smets M, Leonhardt H, Heyn H, Hellmann I, Enard W, Comparative analysis of single-cell RNA sequencing methods, Molecular cell, 65 (2017) 631–643. e634. [DOI] [PubMed] [Google Scholar]
  • [98].Pflug FG, von Haeseler A, TRUmiCount: Correctly counting absolute numbers of molecules using unique molecular identifiers, Bioinformatics, 1 (2018) 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [99].Rhoads A, Au KF, PacBio Sequencing and Its Applications, Genomics Proteomics Bioinformatics, 13 (2015) 278–289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [100].Laszlo AH, Derrington IM, Ross BC, Brinkerhoff H, Adey A, Nova IC, Craig JM, Langford KW, Samson JM, Daza R, Doering K, Shendure J, Gundlach JH, Decoding long nanopore sequencing reads of natural DNA, Nat Biotechnol, 32 (2014) 829–833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [101].Cherf GM, Lieberman KR, Rashid H, Lam CE, Karplus K, Akeson M, Automated forward and reverse ratcheting of DNA in a nanopore at 5-A precision, Nat Biotechnol, 30 (2012) 344–348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [102].Manrao EA, Derrington IM, Laszlo AH, Langford KW, Hopper MK, Gillgren N, Pavlenok M, Niederweis M, Gundlach JH, Reading DNA at single-nucleotide resolution with a mutant MspA nanopore and phi29 DNA polymerase, Nat Biotechnol, 30 (2012) 349–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [103].Snyder MW, Adey A, Kitzman JO, Shendure J, Haplotype-resolved genome sequencing: experimental methods and applications, Nat Rev Genet, 16 (2015) 344–358. [DOI] [PubMed] [Google Scholar]
  • [104].Klesmith JR, Bacik J-P, Wrenbeck EE, Michalczyk R, Whitehead TA, Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning, Proceedings of the National Academy of Sciences, 114 (2017) 2265–2270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [105].Koenig P, Lee CV, Sanowar S, Wu P, Stinson J, Harris SF, Fuh G, Deep sequencing-guided design of a high affinity dual-specific antibody to target two angiogenic factors in neovascular age-related macular degeneration, Journal of Biological Chemistry, (2015) jbc. M115. 662783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [106].Koenig P, Lee CV, Walters BT, Janakiraman V, Stinson J, Patapoff TW, Fuh G, Mutational landscape of antibody variable domains reveals a switch modulating the interdomain conformational dynamics and antigen binding, Proc Natl Acad Sci U S A, 114 (2017) E486–E495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [107].Adams RM, Mora T, Walczak AM, Kinney JB, Measuring the sequence-affinity landscape of antibodies with massively parallel titration curves, Elife, 5 (2016) e23156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [108].Gray VE, Hause RJ, Fowler DM, Analysis of Large-Scale Mutagenesis Data To Assess the Impact of Single Amino Acid Substitutions, Genetics, 207 (2017) 53–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [109].Weinreich DM, Lan Y, Wylie CS, Heckendorn RB, Should evolutionary geneticists worry about higher-order epistasis?, Current opinion in genetics & development, 23 (2013) 700–707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [110].Sailer ZR, Harms MJ, Molecular ensembles make evolution unpredictable, Proceedings of the National Academy of Sciences, 114 (2017) 11938–11943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [111].Sailer ZR, Harms MJ, High-order epistasis shapes evolutionary trajectories, PLoS Comput Biol, 13 (2017) e1005541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [112].Sailer ZR, Harms MJ, Detecting High-Order Epistasis in Nonlinear Genotype-Phenotype Maps, Genetics, 205 (2017) 1079–1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [113].Halabi N, Rivoire O, Leibler S, Ranganathan R, Protein sectors: evolutionary units of three-dimensional structure, Cell, 138 (2009) 774–786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [114].Lockless SW, Ranganathan R, Evolutionarily conserved pathways of energetic connectivity in protein families, Science, 286 (1999) 295–299. [DOI] [PubMed] [Google Scholar]
  • [115].Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M, Direct-coupling analysis of residue coevolution captures native contacts across many protein families, Proc Natl Acad Sci U S A, 108 (2011) E1293–1301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [116].Suel GM, Lockless SW, Wall MA, Ranganathan R, Evolutionarily conserved networks of residues mediate allosteric communication in proteins, Nat Struct Biol, 10 (2003) 59–69. [DOI] [PubMed] [Google Scholar]
  • [117].Russ WP, Lowery DM, Mishra P, Yaffe MB, Ranganathan R, Natural-like function in artificial WW domains, Nature, 437 (2005) 579–583. [DOI] [PubMed] [Google Scholar]
  • [118].Socolich M, Lockless SW, Russ WP, Lee H, Gardner KH, Ranganathan R, Evolutionary information for specifying a protein fold, Nature, 437 (2005) 512–518. [DOI] [PubMed] [Google Scholar]
  • [119].Lee J, Natarajan M, Nashine VC, Socolich M, Vo T, Russ WP, Benkovic SJ, Ranganathan R, Surface sites for engineering allosteric control in proteins, Science, 322 (2008) 438–442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [120].Reynolds KA, McLaughlin RN, Ranganathan R, Hot spots for allosteric regulation on protein surfaces, Cell, 147 (2011) 1564–1575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [121].Raman AS, White KI, Ranganathan R, Origins of Allostery and Evolvability in Proteins: A Case Study, Cell, 166 (2016) 468–480. [DOI] [PubMed] [Google Scholar]
  • [122].Salinas VH, Ranganathan R, Coevolution-based inference of amino acid interactions underlying protein function, Elife, 7 (2018) e34300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [123].Tokuriki N, Tawfik DS, Stability effects of mutations and protein evolvability, Current opinion in structural biology, 19 (2009) 596–604. [DOI] [PubMed] [Google Scholar]
  • [124].Starr TN, Thornton JW, Epistasis in protein evolution, Protein Science, 25 (2016) 1204–1218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [125].Wang D, Bushnell DA, Westover KD, Kaplan CD, Kornberg RD, Structural basis of transcription: role of the trigger loop in substrate specificity and catalysis, Cell, 127 (2006) 941–954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [126].Vassylyev DG, Vassylyeva MN, Zhang J, Palangat M, Artsimovitch I, Landick R, Structural basis for substrate loading in bacterial RNA polymerase, Nature, 448 (2007) 163. [DOI] [PubMed] [Google Scholar]
  • [127].Kaplan CD, Larsson K-M, Kornberg RD, The RNA polymerase II trigger loop functions in substrate selection and is directly targeted by α-amanitin, Molecular cell, 30 (2008) 547–556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [128].Kireeva ML, Nedialkov YA, Cremona GH, Purtov YA, Lubkowska L, Malagon F, Burton ZF, Strathern JN, Kashlev M, Transient reversal of RNA polymerase II active site closing controls fidelity of transcription elongation, Molecular cell, 30 (2008) 557–566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [129].Huang X, Wang D, Weiss DR, Bushnell DA, Kornberg RD, Levitt M, RNA polymerase II trigger loop residues stabilize and position the incoming nucleotide triphosphate in transcription, Proceedings of the National Academy of Sciences, 107 (2010) 15745–15750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [130].Yuzenkova Y, Bochkareva A, Tadigotla VR, Roghanian M, Zorov S, Severinov K, Zenkin N, Stepwise mechanism for transcription fidelity, BMC biology, 8 (2010) 54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [131].Zhang J, Palangat M, Landick R, Role of the RNA polymerase trigger loop in catalysis and pausing, Nature structural & molecular biology, 17 (2010) 99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [132].Kaplan CD, Jin H, Zhang IL, Belyanin A, Dissection of Pol II trigger loop function and Pol II activity-dependent control of start site selection in vivo, PLoS Genet, 8 (2012) e1002627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [133].Larson MH, Zhou J, Kaplan CD, Palangat M, Kornberg RD, Landick R, Block SM, Trigger loop dynamics mediate the balance between the transcriptional fidelity and speed of RNA polymerase II, Proceedings of the National Academy of Sciences, (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [134].Wang B, Predeus AV, Burton ZF, Feig M, Energetic and structural details of the trigger-loop closing transition in RNA polymerase II, Biophysical journal, 105 (2013) 767–775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [135].Čabart P, Jin H, Li L, Kaplan CD, Activation and reactivation of the RNA polymerase II trigger loop for intrinsic RNA cleavage and catalysis, Transcription, 5 (2014) e28869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [136].Xu L, Butler KV, Chong J, Wengel J, Kool ET, Wang D, Dissecting the chemical nteractions and substrate structural signatures governing RNA polymerase II trigger loop closure by synthetic nucleic acid analogues, Nucleic acids research, 42 (2014) 5863–5870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [137].Xu L, Zhang L, Chong J, Xu J, Huang X, Wang D, Strand-specific (asymmetric) contribution of phosphodiester linkages on RNA polymerase II transcriptional efficiency and fidelity, Proceedings of the National Academy of Sciences, 111 (2014) E3269–E3276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [138].Hwang CS, Xu L, Wang W, Ulrich S, Zhang L, Chong J, Shin JH, Huang X, Kool ET, McKenna CE, Functional interplay between NTP leaving group and base pair recognition during RNA polymerase II nucleotide incorporation revealed by methylene substitution, Nucleic acids research, 44 (2016) 3820–3828. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [139].Mishanina TV, Palo MZ, Nayak D, Mooney RA, Landick R, Trigger loop of RNA polymerase is a positional, not acid–base, catalyst for both transcription and proofreading, Proceedings of the National Academy of Sciences, (2017) 201702383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [140].Irvin JD, Kireeva ML, Gotte DR, Shafer BK, Huang I, Kashlev M, Strathern JN, A genetic assay for transcription errors reveals multilayer control of RNA polymerase II fidelity, PLoS genetics, 10 (2014) e1004532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [141].Kaplan CD, Basic mechanisms of RNA polymerase II activity and alteration of gene expression in Saccharomyces cerevisiae, Biochimica et Biophysica Acta (BBA)-Gene Regulatory Mechanisms, 1829 (2013) 39–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [142].Dangkulwanich M, Ishibashi T, Liu S, Kireeva ML, Lubkowska L, Kashlev M, Bustamante CJ, Complete dissection of transcription elongation reveals slow translocation of RNA polymerase II in a linear ratchet mechanism, Elife, 2 (2013) e00971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [143].Walmacq C, Kireeva ML, Irvin J, Nedialkov Y, Lubkowska L, Malagon F, Strathern JN, Kashlev M, Rpb9 subunit controls transcription fidelity by delaying NTP sequestration in RNA polymerase II, Journal of Biological Chemistry, (2009) jbc. M109. 006908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [144].Strathern JN, Jin DJ, Court DL, Kashlev M, Isolation and characterization of transcription fidelity mutants, Biochimica et Biophysica Acta (BBA)-Gene Regulatory Mechanisms, 1819 (2012) 694–699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [145].Kaster BC, Knippa KC, Kaplan CD, Peterson DO, RNA polymerase II Trigger loop mobility: indirect effects of Rpb9, Journal of Biological Chemistry, (2016) jbc. M116. 714394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [146].Cheung AC, Cramer P, Structural basis of RNA polymerase II backtracking, arrest and reactivation, Nature, 471 (2011) 249. [DOI] [PubMed] [Google Scholar]
  • [147].Malinen AM, Turtola M, Parthiban M, Vainonen L, Johnson MS, Belogurov GA, Active site opening and closure control translocation of multisubunit RNA polymerase, Nucleic acids research, 40 (2012) 7442–7451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [148].Nayak D, Voss M, Windgassen T, Mooney RA, Landick R, Cys-pair reporters detect a constrained trigger loop in a paused RNA polymerase, Molecular cell, 50 (2013) 882–893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [149].Barnes CO, Calero M, Malik I, Graham BW, Spahr H, Lin G, Cohen AE, Brown IS, Zhang Q, Pullara F, Crystal structure of a transcribing RNA polymerase II complex reveals a complete transcription bubble, Molecular cell, 59 (2015) 258–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [150].Brueckner F, Cramer P, Structural basis of transcription inhibition by α-amanitin and implications for RNA polymerase II translocation, Nature structural & molecular biology, 15 (2008) 811. [DOI] [PubMed] [Google Scholar]
  • [151].Kettenberger H, Armache K-J, Cramer P, Complete RNA polymerase II elongation complex structure and its interactions with NTP and TFIIS, Molecular cell, 16 (2004) 955–965. [DOI] [PubMed] [Google Scholar]
  • [152].Silva D-A, Weiss DR, Avila FP, Da L-T, Levitt M, Wang D, Huang X, Millisecond dynamics of RNA polymerase II translocation at atomic resolution, Proceedings of the National Academy of Sciences, 111 (2014) 7665–7670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [153].Malagon F, Kireeva ML, Shafer BK, Lubkowska L, Kashlev M, Strathern JN, Mutations in the Saccharomyces cerevisiae RPB1 Gene Conferring Hypersensitivity to 6-Azauracyl, Genetics, (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [154].Castro C, Smidansky ED, Arnold JJ, Maksimchuk KR, Moustafa I, Uchida A, Götte M, Konigsberg W, Cameron CE, Nucleic acid polymerases use a general acid for nucleotidyl transfer, Nature structural & molecular biology, 16 (2009) 212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [155].Viktorovskaya OV, Engel KL, French SL, Cui P, Vandeventer PJ, Pavlovic EM, Beyer AL, Kaplan CD, Schneider DA, Divergent contributions of conserved active site residues to transcription by eukaryotic RNA polymerases I and II, Cell reports, 4 (2013) 974–984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [156].Windgassen TA, Mooney RA, Nayak D, Palangat M, Zhang J, Landick R, Trigger-helix folding pathway and SI3 mediate catalysis and hairpin-stabilized pausing by Escherichia coli RNA polymerase, Nucleic acids research, 42 (2014) 12707–12721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [157].Lane WJ, Darst SA, Molecular evolution of multisubunit RNA polymerases: sequence analysis, Journal of molecular biology, 395 (2010) 671–685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [158].Epshtein V, Mustaev A, Markovtsov V, Bereshchenko O, Nikiforov V, Goldfarb A, Swing-gate model of nucleotide entry into the RNA polymerase active center, Molecular cell, 10 (2002) 623–634. [DOI] [PubMed] [Google Scholar]
  • [159].Bar-Nahum G, Epshtein V, Ruckenstein AE, Rafikov R, Mustaev A, Nudler E, A ratchet mechanism of transcription elongation and its control, Cell, 120 (2005) 183–193. [DOI] [PubMed] [Google Scholar]
  • [160].Tan L, Wiesler S, Trzaska D, Carney HC, Weinzierl RO, Bridge helix and trigger loop perturbations generate superactive RNA polymerases, Journal of biology, 7 (2008) 40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [161].Weinzierl RO, The nucleotide addition cycle of RNA polymerase is controlled by two molecular hinges in the Bridge Helix domain, BMC biology, 8 (2010) 134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [162].Rijal K, Maraia RJ, Active center control of termination by RNA polymerase III and tRNA gene transcription levels in vivo, PLoS genetics, 12 (2016) e1006253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [163].Engel C, Sainsbury S, Cheung AC, Kostrewa D, Cramer P, RNA polymerase I structure and transcription regulation, Nature, 502 (2013) 650. [DOI] [PubMed] [Google Scholar]

RESOURCES