Abstract
Cancerous and aging cells have long been thought to be impacted by transcription errors that cause genetic and epigenetic changes. Until now, a lack of methodology for directly assessing such errors hindered evaluation of their impact to the cells. We report a high-resolution Illumina RNA-seq method that can assess noncoded base substitutions in mRNA at 10−4–10−5 per base frequencies in vitro and in vivo. Statistically reliable detection of changes in transcription fidelity through ∼103 nt DNA sites assures that the RNA-seq can analyze the fidelity in a large number of the sites where errors occur. A combination of the RNA-seq and biochemical analyses of the positions for the errors revealed two sequence-specific mechanisms that increase transcription fidelity by Escherichia coli RNA polymerase: (i) enhanced suppression of nucleotide misincorporation that improves selectivity for the cognate substrate, and (ii) increased backtracking of the RNA polymerase that decreases a chance of error propagation to the full-length transcript after misincorporation and provides an opportunity to proofread the error. This method is adoptable to a genome-wide assessment of transcription fidelity.
INTRODUCTION
Transcription infidelity by RNA polymerases (RNAPs) has been proposed to contribute to genome instability (1) and heritable phenotypic changes (2,3), which may affect aging (4) and carcinogenesis (5,6). To date, assessment of transcription fidelity in vivo has been performed with reporter genes targeting a small number of sequences with a limited spectrum of errors (1,7–10). To extrapolate this limited fidelity analysis to a genome-wide scale, an assumption has been made that transcription errors are randomly distributed. However, several reports have suggested that transcription errors exhibit strong sequence preferences (11–14). Fidelity analysis for the entire transcriptome has been limited by a lack of a reliable methodology. In the past decade, extensive in vitro analyses of transcription fidelity revealed several error-avoidance and error-correcting mechanisms based on biochemical assays for misincorporation of a unique NMP (12,13,15–20) and single-molecule assays using optical trapping techniques (11,21). Typically, these experiments included limited or unbalanced substrate concentrations to detect misincorporation. These in vitro data cannot be easily extrapolated to the genetic fidelity assays involving reporter genes transcribed at high in vivo concentration of substrates and in the presence of transcription factors and structural proteins compacting DNA (1,7–10,22,23). Therefore, there is an urgent need for an approach that would allow simultaneous assessment of transcription fidelity in vivo and in vitro under balanced NTP concentration and on the same DNA sequences.
Deep sequencing technologies such as RNA sequencing (RNA-seq) can analyze ≥1010 bases in a single run, potentially allowing both a genome-wide and in vitro detection of transcription error rates around 10−5 b−1 rate (7,17,18). However, conventional protocols for RNA-seq generate background errors at >10−5 b−1 frequency during the process of cDNA library/cluster formation, sequencing/detection and the mapping of the reads (24), which has made it difficult to detect transcription errors. Advanced deep sequencing techniques use tagging of individual DNA molecules by random sequences in polymerase chain reaction (PCR) primers to identify and filter out the PCR artifacts by counting only those error spots that persist throughout all DNA molecules carrying the same tag (25–27). This tag-based method substantially reduces randomly distributed PCR and sequencing errors of the deep DNA/RNA sequencing (25–27). A problem remaining in this method is that it cannot reduce the errors introduced by reverse transcriptases (RTs) that typically have lower fidelity than DNA polymerases (DNAPs) used for PCR (28,29). More recently, a deep-sequencing method was developed involving analysis of mismatches in overlapping read pairs to identify the artifact errors, but not the RT errors (30). Thus, so far there is no an approach suitable for discriminate RNA errors from the RNA-seq artifacts. Here, we present a high-resolution RNA-seq method based on a remarkable sequencing depth of 106 accompanied by several technical improvements reducing background errors to 10−5 and 10−4 levels. This technique enables statistically reliable detection of changes in transcription fidelity in vitro and in living cells, despite the presence of the artifact errors. This methodology may also be instrumental in addressing controversial noncanonical posttranscriptional RNA-editing (31–35), identification of genomic ‘hotspots’ for transcription errors and their contribution to the genetic diversity of viral populations (27,29,30,36).
MATERIALS AND METHODS
Reagents
NTPs, oligonucleotides and DNA purification kits were purchased from GE Healthcare, Integrated DNA Technologies and Qiagen, respectively. NTPs used in the misincorporation assay (Figure 5 and Supplementary Figure S5) were further purified as described previously (17). The high fidelity RT PrimeScript and the DNAP PrimeSTAR Max used for the cDNA preparation were purchased from Takara Bio.
Figure 5.
Effects of backtracking on the efficiencies of mismatch extension (ME) and intrinsic transcript cleavage, and their dependences on Mn2+. (A) Reaction scheme for AMP misincorporation followed by ME. (B) RNA and downstream nontemplate DNA sequences in the TECs with long (18 nt) and short (8 nt) transcripts used in the assay. (C) Incubations of TEC18C/474 G with the noncognate ATP in the presence of Mg2+ or Mn2+. Arrows indicate the original 18-nt RNA, misincorporation (marked by asterisks) and ME. (D) 5′ RNA shortening to 8-nt length in TEC18C (making TEC8C) increases ME. (E) Quantification of the ME (% of the total fraction in each detection time) from the panels C and D. The curves represent the single-exponential fit of the data; apparent rate constants (k) are shown. Note for lager k in ‘Long, Mg2+’ compared with ‘Long, Mn2+’ condition: This difference is due to the intrinsic transcript cleavage of 19 A* product of misincorporation, which occurs substantially faster in Mg2+ compared with Mn2+. The faster cleavage in Mg2+ leads to apparent earlier than expected saturation of the ME reaction under these conditions. Although the plotting of ME appeared to follow single exponential kinetics, they result from a superposition of 3 different processes of 19 A* misincorporation, 19 A* cleavage and 19 A* extension with the next cognate NMP.
Proteins
RNAP holoenzyme of Escherichia coli RL-916 (the strain was a kind gift from Dr Robert Landick) containing a histidine-tagged RpoC subunit was purified as described previously (37). The GreA and GreB expression plasmids pDNL278 and pMO1.4 were kind gifts from Dr Sergei Borukov. The plasmids were transformed into E. coli strain XL1-Blue cells (Stratagene) for overexpression. The recombinant GreA and GreB were purified according to (38) with the addition of Mono Q column (GE Healthcare) chromatography.
In vitro RNA preparation
The pPR9 plasmid containing lambda phage PR promoter and fd phage terminator was used for the DNA template (Supplementary Figure S1A). The transcribed region is composed of refampicin-resistant rpoB gene that contains a 1546G→T mutation, and partial rplL and rpoC genes of E. coli. The plasmid DNA was purified by Qiagen mini-prep kit and phenol/chloroform/isoamylalcohol (25:24:1). The residual phenol that may affect transcription was removed by solvent extraction with diethyl ether. For transcription reaction, 400 nM holoenzyme in the absence or presence of 12 mM GreA and 4 mM GreB was incubated with 1 mM NTP and 2 nM the plasmid DNA for 15 min at 37°C in transcription buffer [TB; 20 mM Tris–HCl, pH 7.9, 5 mM MgCl2 (or 1 mM MnCl2), 1 mM 2-mercaptoehanol, 0.1 M KCl, 0.1 mg/ml bovine serum albumin] (Supplementary Figure S1B). The reaction was stopped by heat denaturation for 3 min at 90°C followed by DNase I (Takara Bio) treatment for 20 min at 37°C. We verified the production of a homogeneous 5.7 kb RNA by agarose-gel electrophoresis before adding DNase I (Supplementary Figure S1C). The 5.7 kb RNA was purified from the digested DNA, NTPs, abortive oligo-RNA products and proteins as shown in Supplementary Figure S1D.
In vivo RNA preparation
Total RNA was prepared from E. coli MG1655 strain harboring pPR9 plasmid. Cells were cultured at 28°C in LB medium containing ampicillin. The overnight cell culture was inoculated into the fresh medium at 1/70 (v/v) and was incubated for ∼2 h at 28°C (OD600 reached 0.35) and then for 2 h at 42°C (OD600 reached 2.3) to induce the PR promoter (39). The cells in 200 ml culture were harvested and resuspended with a solution containing 0.5% sodium dodecyl sulphate, 20 mM sodium acetate (pH 5.5) and 10 mM EDTA. The suspended cells were mixed with an equal volume of prewarmed saturated phenol (20 mM sodium acetate, 10 mM EDTA, pH 5.5) and incubated for 5 min at 60°C. The mixture was centrifuged, and RNA and DNA were precipitated with ethanol from the supernatant. The pellet was dissolved in DNase I buffer with 10 U of DNaseI and incubated for 30 min. RNA was separated from the digested DNA by acidic phenol extraction followed by G-50 Micro column (GE Healthcare) purification and then precipitated with ethanol. The pellet was dissolved in diethylpyrocarbonate-treated water and used for cDNA synthesis.
Library preparation
The first DNA strand was synthesized using the transcript from the PR promoter (0.8 µg RNA synthesized in vitro or 5 µg total RNA purified from E. coli cells) and a RT PrimeScript. The RNA transcript was mixed with 1 mM dNTP and 5 µM of two specific primers (a and b, Figure 1A) that hybridizes to the RNA transcript at the most 3′ portion of the DNA segments 1 and 6 (Figure 1A) of the first PCR. A hairpin structure between the segments 1 and 2 inhibits elongation of RT on the RNA transcript to the 5′ end. The mixture was incubated for 5 min at 65°C. The PrimeScript, 1× PrimeScript buffer and RNase Inhibitor were added to the mixture according to the manufacturers’ instructions, and the mixture was incubated for 45 min at 42°C, for 5 min at 37°C with RNase H and for 15 min at 70°C. The single-strand DNA product was purified with MinElute PCR purification kit and eluted with 10 µl of the elution buffer. The first PCR including the second DNA strand synthesis was performed with a DNAP PrimeSTAR Max based on the manufacturers’ instructions at 5 cycles for the RNA preparation in vitro and 10 cycles for the RNA preparation in vivo. We noticed that the total RNA purified from E. coli cells had a concentration of the unique transcript from the PR promoter of the pPR9 plasmid by ∼30-fold less than the in vitro RNA preparation. Thus, the five additional cycles make almost same final concentrations of cDNA libraries derived from the in vitro and in vivo RNA preparations. One-tenth total reaction volume of the single strand DNA purified by the Qiagen kit and each primer pair including a barcode and the inner Illumina sequence adapters in the 5′ tails (Figure 1A and Supplementary Table S5) were used for the PCR. This PCR amplified the six different DNA segments comprising the cDNA (transcript) and the internal control (primer) (Figure 1A and B) for the five libraries with respective barcodes. We confirmed that no first PCR product was obtained in each primer pair when RNA solution without RT was used as a template. The six DNA segments obtained from each reaction tube of the first PCR were mixed and purified by the Qiagen kit, and eluted with 10 µl of the elution buffer. The second PCR was performed with one-fifth total reaction volume of the obtained PCR products, a primer pair containing the outer sequencing adapters in the 5′ tails (Supplementary Table S5), and the same PCR enzyme as the first PCR at 6 cycles. The presence of the full-length Illumina sequence adapter and barcode sequence (Illumina TruSeq Index 1–5) in each of the five cDNA libraries was confirmed by Sanger sequencing.
Figure 1.
Experimental setup for transcription-error analysis. (A) Schematic representation of reverse transcription and two PCR steps used to produce barcoded cDNA libraries. The five libraries were made from each of the RNA samples corresponding to the five transcription conditions (Mg2+, Mn2+, GreAB/Mg2+ and GreAB/Mn2+ for in vitro and E.coli cell). The RT primers ‘a’ and ‘b’ (the green arrowheads) replace transcription errors with the chemical oligonucleotide synthetic errors during reverse transcription step. Similarly, in a course of PCR, the first PCR primers (green and yellow lines) replace (green lines) or dilute by >10-fold (yellow lines) transcription errors in the corresponding regions to which these primers hybridize (shown by empty boxes). A six-bases barcode (purple line) and Illumina-specific sequencing adapters (orange and red lines) are introduced to the libraries during first and second PCR steps. (B) The cDNA and internal control regions in the PCR fragment used for Illumina paired-end sequencing. The lengths and directions of the first and the second sequencing reads are indicated. Both sequencing reads contain ∼20 bases of the primer-hybridizing regions where transcription errors are significantly depleted during cDNA preparation (internal controls). All colors are the same as in panel A, except that the DNA regions lacking the original mRNA are white-shaded. (C) Scatter plot of transition-error rates for Mn2+ and Mg2+ RNA products in vitro. Positions in cDNA and internal control are indicated by red and blue colors. The diagonal dotted lines represent y = 2 x (upper), y = x (middle) and y = 1/2 x (lower). Correlation coefficient (R) of the two samples with or without cutoff value >3 × 10−4 b−1 is shown. (D) Transition-error rates in the second read (lower) of the paired-end sequencing are higher than those in the first read (upper). Transition-error rates averaged by the five different RNA preparations and the six sequence segments (see panel A) are plotted against DNA positions with the standard deviations. Red line indicates the cutoff value.
Illumina sequencing
Quantifications of the numbers of amplifiable molecules in the libraries were performed by qPCR using a Library Quantification Kit (KK4824, Kapa Biosystems) and Agilent 2100 Bioanalyzer. The cluster generation on a paired-end flow cell and sequencing were performed with cBot and HiSeq 2000, respectively, according to the user guides of Illumina. The summary of sequencing data is shown in Supplementary Table S1. Raw sequencing data and processed data are available for download at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46479.
Data analysis for the sequencing
The initial data processing, including reads separation by the barcodes and the generation of fastq files, was performed with the CASAVA software (Illumina). A single large fastq format file of high quality reads (Phred score Q ≥ 30, see Supplementary Table S1) was split into about 10 smaller files by using a shell script splitReads.sh (http://code.google.com/p/perm/downloads/detail?name=splitReads.sh) to use SAMtools mpileup –A commands for the following error analysis in the sequence reads (see below). The obtained reads were aligned and mapped to the pPR9 plasmid DNA sequences (1056 bp) using a program Bowtie 0.12.7 that does not allow insertion and deletion in the alignment (40). We chose Bowtie parameter that allows three mismatches. To calculate error rates, we counted the numbers of 4 bases A, T, G, C, and N (not determined) in each position of the mapped reads by using the program SAMtools 0.1.18 (41) with supplemental use of a Perl script, parse_samtools_mpileup.pl (a kind gift from Dr Wei Shao). Each type of error rates per position was determined as the number of sequence reads with a particular type of base-substitution divided by the number of the reads with the reference base in each DNA position.
Ternary elongation complex formation and biochemical transcription assays
The ternary elongation complexes (TECs) carrying 5′-labelled RNAs (see Supplementary Table S6) were assembled and immobilized on Ni2+-NTA agarose (Qiagen) in TB as described previously (42,43). Five to ten picomoles of RNAP was incubated with 7.5–15 pmols of the preannealed RNA–DNA hybrid in 25–50 -μl volume for 10 min at room temperature. Next, 15–50 pmols of the nontemplate DNA strand (NDS) were added for 10 min. The immobilized TEC9s were washed with TB containing 1 M KCl. The TECs were eluted from Ni2+-NTA agarose by TB(-MgCl2) with 100 mM imidazole as described previously (17) and diluted with TB(-MgCl2). TEC18s were obtained from TEC9s by walking on the template of 170 G and 474 G sequences (37). To allow the 1 or 2 steps of walking, the 14 G and 17 G on the 170 G and 474 G sequences, respectively, were substituted with C (see Figures 4A, 5B and Supplementary Figure S3A). The typical concentration of TEC is ∼1 nM (44). All reactions were performed in TB at 37°C. The reactions were stopped by gel-loading buffer (5 M urea; 25 mM EDTA final concentrations). The RNA products were analyzed as described previously (17). Details about the experimental setups for misincorporation, mismatch-extension, RNA cleavage and NTP competition assays are described in the corresponding figures or supplementary figures.
Figure 4.
Mn2+-sensitivity and frequency of G→A errors depend on propensity of RNAP to backtrack at the error site. (A) Hierarchical clustering was performed with MeV v4.7.0. G→A error rates exceeding 3 × 10−4 b−1 at 132 DNA positions are used to generate the clustering diagram. Each error rate is subtracted by the mean of five different RNA preparations to distinguish the error rate difference among the transcription conditions per position. Clusters A–G are indicated by boxes. The 10-nt DNA sequences (nontranscribed strand, 5′-to-3′ direction) where the G→A error occurred at the 3′ RNA end are shown. The number on the left side of each sequence indicates position of G residue analyzed by the RNA-seq. Two sequences from clusters A and F (170 G and 474 G, respectively) that were used for biochemical analyses are underlined. (B) G→A transition rates at the positions 170 G and 474 G in the five RNA preparations analyzed by the RNA-seq. (C) Schematic representations of reversible backtracking of TEC18 bearing the 10-nt sequence of 170G or 474G from +10 to +19 position, where +1 is 5′ end of the RNA. The access of Exo III form the rear end of RNAP is also shown. (D) ExoIII footprinting of the TEC18A and TEC18C. The reaction scheme is shown on the top. The rear-end boundaries of RNAP in the active and backtracked states are shown. The bottom panel shows the 18-nt RNA transcripts in the TECs. The capital letter following the number indicates the base of the 3′ RNA and the RNA length in TEC. AMP-misincorporation at the position 19 is marked by asterisk.
Exonuclease III footprinting
The rear-end Exonuclease III (ExoIII) footprinting was performed as described previously (17,44). TECs were assembled on the 5′ end-labeled template DNA strand and the unlabeled NDS (Supplementary Table S6). The reaction was started by mixing 15 μl TB containing 10 U of ExoIII (New England Biolabs) with 15 μl of the elongation complex at 30°C. To prevent digestion of the NDS by ExoIII for the rear-end footprinting, the NDS carries phosphorothioate bond at the 3′ end. The active state (pretranslocated state) and backtracked states of TECs were determined by shifting the boundaries of RNAP due to stepwise extension of RNA in TECs (17,44).
RESULTS
Strategy for the assessment of transcription fidelity by RNA-seq
The strategy is based on two key assumptions: (i) combined error rates of RT and DNAPs used for the RNA-seq can be reduced to ∼10−5 b−1 range by using high-fidelity RT and DNAPs. The estimated error rates are based on information provided by the manufacturers, which are consistent with our data (see Supplementary Table S3), (ii) multi-subunit RNAPs generate errors with sequence preferences different from those of the structurally unrelated RTs/DNAPs as suggested previously (11–14,36,45). We did not use the tag-based error-correction method to reduce artifact errors because this approach cannot identity/correct RT errors and typically increases the number of PCR cycles owing to the loss of the original templates for PCR in a course of the DNA tagging (25–27). Instead, we significantly reduced the number of PCR cycles to minimize the DNAP errors during the library production (see ‘Materials and Methods’ section). To identify the sequence sites dominated by transcription errors, we used error-prone and error-proof transcription conditions in vitro to increase and decrease transcription errors, respectively. In this system, transcription error rates are changed in a controlled manner for the sequencing reads, whereas the artifact errors remain constant. Detection of transcription errors should be possible at sequence sites favoring transcription and disfavoring the artifact errors even when the averaged enzymatic artifact rates exceed those of transcription. Widespread existences of such sites through the analyzed sequences should be also statistically evaluated. We also reduced the nonenzymatic sequencing errors caused by incorrect base-calling (46) and misalignment (47) of the Illumina reads by setting an appropriate filter eliminating these artificial error hotspots (see below). Finally, we significantly increased the read depth to average 3 × 106 to cover the predicted ∼10−5 b−1 rate for transcription errors.
The RNA samples for the RNA-seq were generated by transcription of pPR9 plasmid (39) by E. coli RNAP in vitro and in vivo. The plasmid contains an ∼5.7-kb fragment of E. coli rpoBC operon that is transcribed from a strong lambda phage PR promoter and terminated at an fd phage transcription terminator (Supplementary Figure S1A). A multi-round transcription by the purified RNAP holoenzyme generated ∼1015 RNA molecules with a uniform length of ∼5.7 kb (Supplementary Figure S1 B-D). The reference transcription reaction was performed in a TB (42) with 5 mM MgCl2 to determine the standard error rate. To reduce fidelity (the error-prone condition), we replaced Mg2+ with Mn2+ (48–50). To increase fidelity (the error-proof condition), we added GreA/GreB proteins (51) for proofreading activity. We kept a balanced high concentration of NTPs (1 mM) in all conditions to avoid forced nonphysiological misincorporation, although the actual concentrations of NTPs in vivo may not be uniform and vary under different growth conditions (52). For the in vivo fidelity measurement, we purified total RNA from the wild-type E. coli strain harboring the same pPR9 plasmid after 2 h induction of the PR promoter at 42°C (39).
We established a method for preparing five different cDNA libraries each with its own barcode for Illumina sequencing (Figure 1A). Each 6-nt barcode allows multiplexing all five in vitro and in vivo preparations in a single sequencing analysis. The 5′ fragment of the 5.7 kb RNA transcripts was reverse transcribed, and the product was subjected to PCR reactions that generated six ∼200 bp segments (Figure 1A). The primers contained a specific barcode for each of the five starting preparations and the inner Illumina-sequencing adapters (Figure 1A). The second step of PCR generated the final cDNA libraries for the Illumina sequencing by using the first-step PCR product as a template and primers containing the outer sequencing adapters in the 5′ tails (Figure 1A). In the first cycle and the remaining 4 cycles of the first PCR, chemical synthetic errors in the DNA primers replace and steadily dilute transcription errors by 2-fold in mRNA segments to which these primers hybridize. Thus, transcription errors in the corresponding cDNA segments were replaced or 16-fold reduced by 5 cycles of the first PCR (Figure 1A, the empty boxes). Consequently, contribution of transcription errors in these segments becomes negligible in the final cDNA libraries compared with the rest of cDNA. Importantly, we used these outer segments (shown by green and yellow lines, Figure 1A and B) as internal controls to compare error rates in these sequences with those in the embedded cDNA segment carrying intact transcription errors (Figure 1B, blue lines). Base substitution errors made during synthesis of primer DNA are reported at 10−4–10−5 b−1 rates (based on the manufacturer information), which is consistent with our data (see Supplementary Figure S2 and Table S4).
We obtained 191 099 124 reads with high base-calling quality [Phred score Q ≥ 30 (46)] by the paired-end sequencing (Supplementary Table S1). Each sequenced read included the cDNA and the internal control sequence (Figure 1B). The uniquely mapped sequence reads covered 1056 bp with an average 3 × 106 read depth (Supplementary Table S2). To assess types and rates of RNA/DNA changes per position, we excluded insertion and deletion errors to avoid reads misalignment (47) during bioinformatic analysis. In the mapped sequence reads, we found a few positions with abnormally high background of transversions A→C (first read) and G→T (second read) with 10−2 or 10−3 b−1 frequency, which are unlikely due to transcription errors. These errors are probably caused by the relatively close emission spectra of the corresponding fluorophores and their incomplete separation by optical filters in the Illumina platform at these particular positions (24). These rare positions have been ignored.
We plotted transition error rates for the cDNA sequences in the standard Mg2+ transcription condition against the error-prone Mn2+ condition (Figure 1C, red dots). We compared this plot with the corresponding plot derived from the internal controls where transcription errors were replaced or diluted with the oligo DNA-synthetic errors (Figure 1C, blue dots). If there are no differences between the Mg2+/Mn2+sets (indicating a failure in detection of transcription errors), the data should fall along the y = x line as is observed for the internal control positions (R > 0.9). In contrast, for the cDNA positions, the plots of the lower error rates were localized in the y > x area (R < 0.9 for the ≤ 3 × 10−4 b−1 rates). Two-tailed F-test for the Mg2+/Mn2+ RNA samples confirmed that the error rates ≤3 × 10−4 b−1 are not equally distributed in cDNA [P = 2 × 10−4 (n = 540)] as opposed to their equal distribution in the internal controls [P = 0.5 (n = 125)]. Notably, the transition errors occurring at >3 × 10−4 b−1 rates were primarily observed in the second read that required an additional strand synthesis step (Figure 1D). Therefore, these errors mostly derived from the artifact of paired-end sequencing. We used this information to set a cutoff value of 3 × 10−4 b−1 error rate in our statistical analysis of transcription errors.
The high-resolution RNA-seq detects changes in transition error rates in vitro
Next, we separately compared each type of transition error between the two in vitro RNA samples representing the standard and the error-prone transcription conditions (Mg2+/Mn2+ plot, Figure 2, left column). We observed an up to 2-fold Mn2+-dependent increase in errors for G→A and T(U)→C transitions in a majority of cDNA positions in the error range from 3 × 10−4 to 6 × 10−5 b−1. A nonparametric t-test between the two samples provided a significant difference in means of the two samples (P < 0.05). We observed slight Mn2+-dependent increase in C→T(U) transition rate (P = 0.09) and no difference in A→G transition (P = 0.7). Because the detected mean rate of A→G transition was the lowest among the four types of transitions (Supplementary Figure S2), the A→G transcription errors appeared to be masked by the artifacts even in the error-prone conditions for RNAP. Note that the internal control showed no significant effect of Mn2+ on any type of transition error (Figure 2, left column). Thus, the RNA-seq detected Mn2+-dependent increase in three types of transcription errors made by E. coli RNAP in vitro at the 10−5–10−4 b−1 rates.
Figure 2.
Scatter plots of transition-error rates. The error rates per position in the cDNA and internal control are plotted for error-prone/standard (left column), error-prone/error-proof (middle column) and moderate-error-proof/error-proof (right column) sets of conditions as shown on the top. The error rates ≤3 × 10−4 b−1 were used for the statistical analysis. P value of two-tailed nonparametric t-test for the two samples is shown. For the cDNA, n = 132 (G→A), n = 142 (C→T), n = 104 (T→C) and n = 162 (A→G). For the internal control, n = 39 (G→A), n = 26 (C→T), n = 30 (T→C) and n = 30 (A→G).
To further validate the difference of the error rates in the Mn2+/Mg2+ samples, we added GreA/B to our standard reaction (Mg2+). GreA/B are expected to reduce errors by its proofreading activity (15,53). As expected, GreA/B amplified the differences in the rates of G→A, T(U)→C and C→T(U) transitions between the Mn2+ and Mg2+ samples (Figure 2, middle column), and significantly reduced the means of the three transition-error rates in Mg2+ and Mn2+ samples (Figure 2, right column), indicating that we detected proofreading activity of GreA/B in both Mg2+ and Mn2+ conditions. GreA/B did not affect the error rates in the internal control. A→G transitions were also not affected by GreA/B in the cDNA and in the internal control regions, again suggesting the artifact origin of the majority of A→G errors. In a fraction of cDNA positions for the other three transition types, we also did not observe significant changes in the error rates between the error-prone and error-proof transcription conditions (Figure 2).
Comparison of transition-error rates in vitro and in vivo
It is broadly assumed that RNAP has similar intrinsic fidelity in vivo and in vitro (7,17). However, in vitro fidelity assessed by single NMP misincorporation assay does not account for error propagation to full-length RNA. Moreover, the in vitro fidelity, defined as a ratio of kpol/Kd for cognate and noncognate NTP (17,18), does not take into account for proofreading activity of RNAP that requires backtracking of the enzyme. The in vivo fidelity could also be affected by local DNA structures, DNA damage and promoter strength of the gene (1,54). Therefore, the in vitro fidelity is not exactly related to the in vivo fidelity.
To evaluate these differences, we used the RNA-seq to compare the error rates for the same RNA produced either in vitro or in vivo. Scatter plots visualized the differences in transition-error rates between in vivo sample and in vitro Mg2+ samples ± GreA/B (Figure 3). In the cDNA positions, we observed significant differences between the in vivo and the standard (+Mg2+) in vitro samples: C→T(U) transitions were overrepresented in the in vivo samples (P < 0.05), whereas G→A and T(U)→C transitions were underrepresented (P ≤ 0.05) (Figure 3). For G→A and T(U)→C transitions, addition of GreA/B to the in vitro reaction reduced the differences between the in vivo and in vitro samples (Figure 3). This result indicates an extensive RNA proofreading by GreA/B in the living cells as was suggested previously (55). Thus, our data suggest that transcription in the wild-type E. coli cells containing functional GreA/B proteins has similar fidelity as transcription in vitro in the presence of Gre factors.
Figure 3.
Scatter plot of transition-error rates for in vivo and in vitro Mg2+ samples with (left) or without (right) GreA/B in the cDNA (top) and internal control (bottom). All symbols are the same as in Figure 2. The cutoff for the error rates is applied for two-tailed nonparametric t-test, but not for the scatter plots. The n for the t-test is same as in Figure 2.
The increased rate of C→T(U) transition in the in vivo sample was insensitive to GreA/B, suggesting that these errors may be introduced by DNAP during the five additional cycles of the first PCR used only for cDNA synthesis with the in vivo RNA sample. The same increased background may dilute G→A and T(U)→C errors for the in vivo sample. The detected difference in C→T(U) error rates might be caused by a modest cellular stress during shift to 42°C for induction of PR promoter of the rpoBC gene (see ‘Materials and Methods’ section), decrease of intrinsic fidelity of RNAP for certain types of errors at elevated temperature or due to a spontaneous cytosine deamination before or during RNA purification from E. coli cells. Although the in vivo frequency of a spontaneous deamination of cytosines in DNA is known to occur at 10−9 order (56), the corresponding rate for the RNA is unknown. We also observed minor increases in the error rates for G→A and C→T(U) transitions in the internal controls for the in vivo sample compared with the standard in vitro sample (Figure 3). This was likely due to the errors introduced during the DNA oligonucleotides synthesis rather than the DNAP errors during PCR [see Supplementary Figure S2, the error rates for the oligo-DNA synthesis are slightly higher (by ∼1 × 10−4) than those of transcription for all four types of transition].
Backtracking controls mismatch extension
We performed a hierarchical clustering analysis (57) of the error rates in all positions used for the statistical analysis of the errors at the lower than the threshold value, 3 × 10−4 b−1. This analysis connects by a series of branches the DNA positions and the fluctuation patterns/levels of error rates depending on transcription conditions. Thus, this analysis identifies DNA positions exhibiting the similar error-rate profiles under the standard in vitro, error-prone Mn2+, error-proof GreA/B and the in vivo conditions. We chose G→A error because of its highest sensitivity to Mn2+ and GreA/B (Figure 2). The G→A errors were clustered into major C–G and minor A, B groups where the error rates were increased and not affected by Mn2+, respectively (Figure 4A). The majority of the Mn2+-sensitive errors in groups C–G was also reduced by GreA/B (Figure 4A). This significant overlap strongly indicates transcription origin of these errors, which were susceptible to the chemical and protein factors specifically targeting transcription fidelity. We further argue that the Mn2+-insensitive errors belonged to the RNA-seq artifacts that become more prominent at the sequences exhibiting relatively low transcription error rates. Alternatively, these sequences might generate ‘true’ transcription errors with an intrinsic resistance to Mn2+. Note, that the averaged error rate in group A (1.1 × 10−4) was lower than in group B (1.8 × 10−4), suggesting that transcription errors from the former group are more diluted with the artifact errors.
In each cluster, we aligned the 9-nt sequences located immediately upstream from the G→A error site (Figure 4A). This was based on the assumption that the catalytic properties of RNAP are mainly determined in 9-bp RNA–DNA hybrid of a TEC (58). Interestingly, the RNA–DNA hybrid sequences for Mn2+-insensitive errors were strongly enriched with short A/T(U) tracts (group A in Figure 4A) as opposed to the more balanced sequence content of the sites affected by Mn2+ and GreA/B (the representative Mn2+-sensitive group F in Figure 4A). A/U-rich sequences in the RNA–DNA hybrid have been shown to promote RNAP backtracking on DNA (59) as one of the mechanisms increasing RNAP fidelity (21). Thus, we assumed that a relatively low frequency of Mn2+-insensitive transcription errors was related to increased backtracking of RNAP.
To address backtracking as a potential error-correcting mechanism during processive elongation, we arbitrarily selected one sequence from each group: 170G (Mn2+-insensitive group A, relatively lower error rate) and 474 G (Mn2+-sensitive group F, relatively high error rate) (Figure 4B) and analyzed RNAP backtracking at these sequences by ExoIII footprinting (17,44,60). A dynamic pattern of DNA digestion by ExoIII provides information of distance and stability for individual backtracked states of RNAP. The TEC was assembled with a 9-nt RNA hybridized to the DNA template containing 170 G or 474 G sequence with a modification that is required for the TEC walking (see ‘Materials and Methods’ section) (37). The 9-nt RNA was elongated to 18-nt length with NTPs, making TEC18A (corresponding to the 170 G sequence) or 18C (corresponding to the 474 G sequence), which has the new 3′RNA end located immediately 5′ of the site where G→A error was detected (Figure 4C). When RNAP reversibly backtracks, ExoIII that digests DNA from the rear-end of RNAP (Figure 4C) produces the expanded rear-end boundary(ies) of the backtracked state(s), which converts to a boundary of the active state on prolonged incubation with the nuclease (44). We observed two differences in backtracking at the 170 G and 474 G sequences (Figure 4D). TEC18A/170G was equilibrated between the active state and 1–10 bp stably backtracked states within 90 s of incubation with ExoIII, whereas TEC18C/474 G was equilibrated primarily in the active state with a minor 6-bp backtracked state. Thus, backtracking from the active state was more strongly induced or stabilized in TEC18A/170G compared with TEC18C/474 G as was predicted from the difference in their A/T(U) sequence contents (Figure 4A). Interestingly, AMP misincorporation in TEC18C/474 G (mimicking the G→A error during processive elongation) did not cause RNAP to advance 1 bp forward on the DNA (Figure 4D). This result indicates that the TEC19A remains in a 1 bp backtracked state after the misincorporation, which is consistent with the previous findings on the effect of misincorporation on backtracking (15,21). We concluded that the higher backtracking potential and thus better proofreading on the 170 G as opposed to the 474 G sequence is responsible for the relatively lower error rate detected by the RNA-seq (see Supplementary text and Supplementary Figure S3).
Next, we tested if backtracking on 474 G sequence affects the G→A misincorporation in the presence of Mn2+ as was indicated by the clustering analysis. To mimic the G→A misincorporation at 474 G site during processive elongation, we measured the rate of AMP-misincorporation in TEC18C in the presence of Mg2+ or Mn2+ (Figure 5A and B). As expected, Mn2+-sensitive TEC18C misincorporated AMP more rapidly in the presence of Mn2+ than in the presence of Mg2+ (Figure 5C). Remarkably, we detected high level of an endonucleolytic RNA cleavage at 7-nt upstream of the 3′ RNA end in the presence of Mg2+, and to a substantially less extent in the presence of Mn2+ (Figure 5C). This cleavage was consistent with the ExoIII footprinting data (Figure 4D), showing backtracking of this complex at 6 bp distance. We propose that backtracking after the misincorporation generated a substrate for the cleavage in this complex with or without GreA/B. At the longest 6–24 min incubation time with noncognate ATP, Mn2+ also appeared to decrease extension of the 3′ error (19 A* product) with the next cognate substrate (21 A* and 22 A** products) (Figure 5C and E). One would expect these opposite effects of Mn2+ on error correction and extension to counteract one another leading to a net zero impact of Mn2+ to fidelity. However, at the shorter 10–90 s, where an impact of the slow intrinsic RNA cleavage was negligible, Mn2+ stimulated rather than inhibited the error extension (Figure 5C and E). This effect was confirmed in the experiment with the TEC disregarding the cleavage activity described below. Thus, Mn2+ appeared to decrease transcription fidelity on 474 G sequence by suppressing intrinsic RNA proofreading activity in the backtracked complex and by promoting extension of the error with the next cognate NMP, which allowed the error propagation into a full-length RNA.
To prove that backtracking was the major error correction mechanism at the 474 G site, we assembled a version of TEC18C/474 G but reducing the RNA length from 18 to 8 nt by removing 10 nt from the 5′ end (TEC8C, Figure 5B). The shortening of the nascent RNA has been shown to prevent backtracking (44). Interestingly, elimination of backtracking dramatically enhanced G→A error and extension of the error with the next cognate AMP at this sequence (Figure 5C and D). The efficiency and the rate of mismatch extension increased 4 - and >10-fold, respectively, in TEC8C compared with the original TEC18C in the presence of Mg2+ or Mn2+ (Figure 5E). Thus, backtracking followed by error correction by an intrinsic RNA cleavage represents a major mechanism for control of the G→A error at the 474 G site during processive transcription. The slow rate of the intrinsic cleavage at the 474 G site indicated that a substantial fraction of the 3′RNA misincorporation are not able to propagate to the full-length transcript due to backtrack pausing of RNAP after misincorporation. Applying the RNA-seq to nascent transcripts, isolated from backtracked elongation complexes of RNAP containing a 3′ error, by the previously established NET-seq method warrants addressing the effect of mismatch extension on the detected error rates (61).
Error rate depends on the nucleotide at the 3′ end of the transcript
We noted that not all short A/T tracts followed by a 3′ guanine residue identified by the RNA-seq exhibit low frequency of G→A errors, suggesting that high propensity for backtracking may not be the only parameter to control transcription fidelity (12). In search of another fidelity parameter embedded into sequence context, we aligned the sequences surrounding the G→A sites composed of the top 10% of the either lowest or highest error rate group, each of which was displayed by sequence logo (62,63). This analysis revealed a strong preference for adenine in n − 1 position for the low error rate sites and cytosine in the same position for high error rate sites when n is a position for the error (Figure 6A). This analysis also revealed sequence preferences for n−1 and n−2 DNA positions for the other types of transition errors (Supplementary Figure S4).
Figure 6.
A 3′ residue in the nascent transcript determines the G→A error rate. (A) DNA logo derived from a sequence alignment around the dG residues coding for the low or high G→A error rate. Top lowest 10% (left) and top highest 10% (right) of all G→A error rates (<1 × 10−3) averaged by five different RNA preparations are used for the analysis. The residue frequencies from n − 2 to n + 1 (G→A error occurs at n site) were plotted with WebLogo (63). Y-axis is not shown as typical log base 2, but it represents the actual number to depict the residue types. (B) DNA/RNA scaffold for testing the effect of dC→dA substitution in the n − 1 site of DNA. TEC18C (n − 1 = C) and TEC18A (n − 1 = A) on the 474 G sequence are shown. (C) Biochemical G→A error rates in TEC18C or 18 A as determined by NTP competition assay (see text for more details) (64,65). (D) Time course of AMP misincorporation for GMP in TEC18C or TEC18A. The curves represent the double exponential (TEC18C) or single exponential (TEC18A) fit of the data; apparent rate constants (k) are shown. The slower misincorporation rate obtained from the double-exponential fitting curve for TEC18C data was related to the intrinsic cleavage of 3′ RNA in this complex.
Next, we tested if a residue in the DNA or the RNA in n − 1 position affects RNAP misincorporation rate. We used a previously developed NTP competition assay monitoring a single NMP misincorporation in the presence of a mixture of a cognate and noncognate NTP (64,65). We compared the G→A error rates in TEC18C/474 G and TEC18A/474 G carrying C18 (n − 1)→A substitution in DNA (Figure 6B, C and Supplementary Figure S5 A–C). As predicted by the sequence logo (Figure 6A), the C18 (n − 1)→A substitution decreased the biochemical G19→A error rate in both Mg2+ and Mn2+ on 474 G sequence (Figure 6C) without affecting RNAP backtracking and intrinsic transcript cleavage (Supplementary Figure S5 D and E). The n − 1 mutation also caused a 10-fold reduction of the rate of G19→A transcription error in a single AMP misincorporation assay lacking the cognate GTP (Figure 6D) Interestingly, the n − 1 mutation stimulated AMP misincorporation without a strong effect on the cognate GMP incorporation (data not shown). This difference suggests that a chemical nature of the 3′ RNA–DNA base pair plays a major role in binding or addition of a noncognate NTP with only minor contribution to the same processes with a cognate substrate.
DISCUSSION
Our work provides the first evidence that RNA-seq can assess physiological transcription error rates even in the presence of artifact errors. We directly detected changes in the transition-error rates in the range of 4 × 10−5 to 3 × 10−4 b−1 (Figures 2 and 3). These limits identify a lower baseline of the standard transition-error rates at 10−5 order or less. Our findings that GreA/B increases the fidelity of processive transcription in vitro to a level of the fidelity in vivo (Figures 2 and 3) provide an ample opportunity for application of the RNA-seq for evaluation of transition-type transcription errors in E. coli cells harboring viable greA/greB deletions, mutations in RNAP subunit that reduce fidelity in vitro (19,66) and in the wild-type cells under different growth conditions including biological stresses/DNA damages (25,67). Although we did not detect any obvious hotspots (29) for the RNAP errors within the tested sequence, our results do not exclude that these hotspots exist genome-wide. The transversion errors by RNAP seem to occur with lower rates than transitions hindering their detection by the RNA-seq (Supplementary Figure S6). The transversions appear to favor conversion to thymine or adenine rather than to cytosine or guanine (Supplementary Figure S6), suggesting that RNAP has preferences for transversion errors that are similar to RT and/or DNAP.
Although transcription fidelity has been extensively studied by a single NMP incorporation in elongation complexes deprived of NTP substrates, the mechanism of fidelity control during processive transcription awaits development of an appropriate methodology. Our new approach that combines RNA-seq and biochemical analyses of transcription errors propagating to the full-length RNA revealed two sequence-specific mechanisms used during processive transcription under physiological NTPs concentration: (i) NTP selection related to the chemical nature of DNA–RNA base pair immediately upstream from the error site, and (ii) postincorporation error correction by the intrinsic transcript cleavage in the backtracked RNAP (Figure 7). A recent biochemical study suggested that a noncognate NTP is rejected from RNAP by formation of a stressed sugar–phosphate backbone in the template DNA strand, which involves angling of a 3′ RNA–DNA base pair to align the NTP for catalysis (16). The authors argued that the angling may weaken stacking of the 3′ base pair and ribose contacts with a noncognate NTP to induce its preferential rejection from the active center without a significant effect on the properly paired cognate NTP. We speculate that 3′ rA-dT and 3′ rC-dG base pairs could have an unequal stacking potential to differently affect the noncognate NTP rejection. Further analysis is warranted for generalizing this sequence-specific mechanism for the NTP selection. When the preincorporation NTP selection fails, the enhanced backtracking that interferes with an extension of the 3′ RNA mismatch provides an additional time to proofread the error by RNA cleavage. In this scenario, trans-acting factors that promote backtracking like DNA-bound proteins or nucleosomes may increase fidelity, whereas factors that interfere with backtracking like trailing RNAP, ribosomes (in prokaryotes) or secondary structure in the nascent RNA may decrease fidelity.
Figure 7.
Multiple pathways for control of RNAP fidelity. Transcription error rate is determined by the 3′ RNA–DNA base pair in TEC (preincorporation substrate selection) and by backtracking propensity of RNAP (postincorporation proofreading). The 3′ RNA–DNA base pair controls misincorporation rate of a noncognate substrate (indicated by an asterisk). The DNA sequences such as A/T-rich tracts and protein factors that promote backtracking increase fidelity by decreasing extension of the 3′ RNA error with the next cognate NMP (shaded). The error is corrected by the intrinsic or Gre-assisted transcript cleavage in backtacked TEC. The irreversible backtrack arrest of TEC carrying the 3′ RNA error may derive from the inefficient transcript cleavage in the backtracked complex (the dead-end pathway).
In eukaryotic transcription, Nesser et al. previously analyzed transcription errors throughout ∼450-bp cDNA of CAN1 transcript in yeast by Sanger sequencing (9). Their work reported a much higher 1.3 × 10−3/bp rate of substitutions compared to the rate observed for E. coli RNAP in our study and the rates determined for yeast RNAP II in vitro (17). The authors claimed transcription rather than an artificial origin of these errors by showing that the rate was increased to 1.7 × 10−3/bp in the mutant cell lacking Rpb9 subunit of RNAP II. Rpb9 is linked to transcription fidelity based on the results of in vitro misincorporation assays (49). Surprisingly, deletion of yeast DST1 gene coding for RNA proofreading factor TFIIS (GreA/B analog) had almost no effect on the error rate in vivo (9). Once again, this result was different from our observation of a major impact of GreA/B proteins on transcription fidelity in E. coli. A source of these differences requires additional investigation.
The future application of the RNA-seq will allow monitoring genome-wide transcription fidelity under different growth conditions and in different cell types. Would 105 read depth (instead of 106 read depth used here) that covers 10−5 b−1 frequency be sufficient for detection of changes in transition-type transcription errors? This depth reduction could allow determination of fidelity against ∼105 bases of transcriptome by a single Illumina sequencing analysis and lead to a significant cost reduction. We positively answer this question by showing that, for a voluntarily chosen position 578 of the rpoBC transcript, G→A transition-error rate was not significantly affected by a 10-fold decrease of the depth to 2 × 105 (Supplementary Figure S7). At this reduced depth, we successfully detected the responses of the error rates to Mn2+ and GreA/B in vitro. Thus, the 105 read depth appears to be sufficient to assess increases in transition errors from the basal level across an entire transcriptome with caution that the sensitivity to transcription error is varied by mRNA levels among different genes. Another potential issue for the genome-wide RNA-seq is the additional PCR cycles and barcoding bias (68), accompanied by the adapter ligation to cDNA during the library preparation. Our cDNA preparation included 11 cycles of PCR for the in vitro transcription samples and 16 cycles for the in vivo sample. However, we found that the in vivo sample had significantly lower G→A and T(U)→C errors than the standard (Mg2+) in vitro sample, indicating no significant contribution of the PCR artifacts to the types of transcription errors detected in this work (Figure 3 and Supplementary Figure S2). This suggests that such an increase in the PCR cycles appeared not to dilute transcription errors beyond the detection limit. Our data also clearly indicate that the most significant technical improvement allowing reduction of the artifact transition errors in the Illumina platform is a suppression of the errors during the second read of the paired-end sequencing, which typically occur with >3 × 10−4 b−1 frequency (Figure 1D). The tag-based method (25–27) and overlapping read pairs method (30) have a strong potential in identification of the sequencing errors. Additional improvement of the RNA-seq bioinformatics has the potential to discriminate between transcription frame-shift errors (not analyzed in this work) and insertion/deletions artifacts associated with the Illumina platform and reads mapping. This approach may enable detection of physiologically relevant transcription slippage at short homopolymeric tracts and dinucleotide repeats broadly present in transcribed genes and contributing to slippage-associated diseases in humans (23,69).
It is worth mentioning that transcription errors at 10−5 error rate may have a deleterious effect on genome stability by inducing a prolonged stalling of RNAP at multiple sites across >105 bp transcribed region in a genome. The irreversibly arrested TEC should block DNA replication and subsequent rounds of transcription leading to double-strand DNA breaks and cessations of gene expression (64,70). The prolonged RNAP stalling is exemplified by almost irreversible loss of RNAP catalytic activity after an NMP misincorporation to TEC18, which was not accompanied by dissociation of RNAP from DNA (Figure 5). This mechanism is different from the previously proposed production of toxic proteins due to transcription errors, which requires an assumption that error rate of transcription is comparable with that of translation. Transcription misreading may have an impact on cell physiology comparable with translation misreading owing to multi-round translation of an erroneous mRNA molecule.
ACCESSION NUMBERS
GEO number: GSE46479.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR online.
FUNDING
Fellowship from JSPS (to M.I. in part); Intramural Research Program of the National Institutes of Health, National Cancer Institute, Center for Cancer Research (to M.K). Funding for open access charge: The Intramural Research Program of the National Institutes of Health, National Cancer Institute, Center for Cancer Research.
Conflict of interest statement. None declared.
Supplementary Material
ACKNOWLEDGEMENTS
We thank D.L. Court of NCI for discussions and critical reading of the manuscript; W. Tang, T.D. Schneider and Y. Zhao of NCI and T. Sakamoto of NAIST for technical comments and support in sequence analysis; S.H. Hughes, M.L. Kireeva, S. Kakar, M. Bubunenko and J.N. Strathern of NCI for discussions and comments on the manuscript; W. Shao of NCI for the Perl script. We also thank NCI sequencing facility for Illumina sequencing and early bioinfomaic analysis.
REFERENCES
- 1.Strathern JN, Jin DJ, Court DL, Kashlev M. Isolation and characterization of transcription fidelity mutants. Biochim. Biophys. Acta. 2012;1819:694–699. doi: 10.1016/j.bbagrm.2012.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gordon AJ, Halliday JA, Blankschien MD, Burns PA, Yatagai F, Herman C. Transcriptional infidelity promotes heritable phenotypic change in a bistable gene network. PLoS Biol. 2009;7:e44. doi: 10.1371/journal.pbio.1000044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Goldsmith M, Tawfik DS. Potential role of phenotypic mutations in the evolution of protein expression and stability. Proc. Natl Acad. Sci. USA. 2009;106:6197–6202. doi: 10.1073/pnas.0809506106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Paoloni-Giacobino A, Rossier C, Papasavvas MP, Antonarakis SE. Frequency of replication/transcription errors in (A)/(T) runs of human genes. Hum. Genet. 2001;109:40–47. doi: 10.1007/s004390100541. [DOI] [PubMed] [Google Scholar]
- 5.Rodin SN, Rodin AS, Juhasz A, Holmquist GP. Cancerous hyper-mutagenesis in p53 genes is possibly associated with transcriptional bypass of DNA lesions. Mutat. Res. 2002;510:153–168. doi: 10.1016/s0027-5107(02)00260-9. [DOI] [PubMed] [Google Scholar]
- 6.Hubbard K, Catalano J, Puri RK, Gnatt A. Knockdown of TFIIS by RNA silencing inhibits cancer cell proliferation and induces apoptosis. BMC Cancer. 2008;8:133. doi: 10.1186/1471-2407-8-133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Blank A, Gallant JA, Burgess RR, Loeb LA. An RNA polymerase mutant with reduced accuracy of chain elongation. Biochemistry. 1986;25:5920–5928. doi: 10.1021/bi00368a013. [DOI] [PubMed] [Google Scholar]
- 8.Taddei F, Hayakawa H, Bouton M, Cirinesi A, Matic I, Sekiguchi M, Radman M. Counteraction by MutT protein of transcriptional errors caused by oxidative damage. Science. 1997;278:128–130. doi: 10.1126/science.278.5335.128. [DOI] [PubMed] [Google Scholar]
- 9.Nesser NK, Peterson DO, Hawley DK. RNA polymerase II subunit Rpb9 is important for transcriptional fidelity in vivo. Proc. Natl Acad. Sci. USA. 2006;103:3268–3273. doi: 10.1073/pnas.0511330103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Shaw RJ, Bonawitz ND, Reines D. Use of an in vivo reporter assay to test for transcriptional and translational fidelity in yeast. J. Biol. Chem. 2002;277:24420–24426. doi: 10.1074/jbc.M202059200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Larson MH, Zhou J, Kaplan CD, Palangat M, Kornberg RD, Landick R, Block SM. Trigger loop dynamics mediate the balance between the transcriptional fidelity and speed of RNA polymerase II. Proc. Natl Acad. Sci. USA. 2012;109:6555–6560. doi: 10.1073/pnas.1200939109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sydow JF, Brueckner F, Cheung AC, Damsma GE, Dengl S, Lehmann E, Vassylyev D, Cramer P. Structural basis of transcription: mismatch-specific fidelity mechanisms and paused RNA polymerase II with frayed RNA. Mol. Cell. 2009;34:710–721. doi: 10.1016/j.molcel.2009.06.002. [DOI] [PubMed] [Google Scholar]
- 13.Kashkina E, Anikin M, Brueckner F, Pomerantz RT, McAllister WT, Cramer P, Temiakov D. Template misalignment in multisubunit RNA polymerases and transcription fidelity. Mol. Cell. 2006;24:257–266. doi: 10.1016/j.molcel.2006.10.001. [DOI] [PubMed] [Google Scholar]
- 14.Rosenberger RF, Hilton J. The frequency of transcriptional and translational errors at nonsense codons in the lacZ gene of Escherichia coli. Mol. Gen. Genet. 1983;191:207–212. doi: 10.1007/BF00334815. [DOI] [PubMed] [Google Scholar]
- 15.Erie DA, Hajiseyedjavadi O, Young MC, von Hippel PH. Multiple RNA polymerase conformations and GreA: control of the fidelity of transcription. Science. 1993;262:867–873. doi: 10.1126/science.8235608. [DOI] [PubMed] [Google Scholar]
- 16.Sosunova E, Sosunov V, Epshtein V, Nikiforov V, Mustaev A. Control of transcriptional fidelity by active center tuning as derived from RNA polymerase endonuclease reaction. J. Biol. Chem. 2013;288:6688–6703. doi: 10.1074/jbc.M112.424002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kireeva ML, Nedialkov YA, Cremona GH, Purtov YA, Lubkowska L, Malagon F, Burton ZF, Strathern JN, Kashlev M. Transient reversal of RNA polymerase II active site closing controls fidelity of transcription elongation. Mol. Cell. 2008;30:557–566. doi: 10.1016/j.molcel.2008.04.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Roghanian M, Yuzenkova Y, Zenkin N. Controlled interplay between trigger loop and Gre factor in the RNA polymerase active centre. Nucleic Acids Res. 2011;39:4352–4359. doi: 10.1093/nar/gkq1359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Holmes SF, Santangelo TJ, Cunningham CK, Roberts JW, Erie DA. Kinetic investigation of Escherichia coli RNA polymerase mutants that influence nucleotide discrimination and transcription fidelity. J. Biol. Chem. 2006;281:18677–18683. doi: 10.1074/jbc.M600543200. [DOI] [PubMed] [Google Scholar]
- 20.Bar-Nahum G, Epshtein V, Ruckenstein AE, Rafikov R, Mustaev A, Nudler E. A ratchet mechanism of transcription elongation and its control. Cell. 2005;120:183–193. doi: 10.1016/j.cell.2004.11.045. [DOI] [PubMed] [Google Scholar]
- 21.Shaevitz JW, Abbondanzieri EA, Landick R, Block SM. Backtracking by single RNA polymerase molecules observed at near-base-pair resolution. Nature. 2003;426:684–687. doi: 10.1038/nature02191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wagner LA, Weiss RB, Driscoll R, Dunn DS, Gesteland RF. Transcriptional slippage occurs during elongation at runs of adenine or thymine in Escherichia coli. Nucleic Acids Res. 1990;18:3529–3535. doi: 10.1093/nar/18.12.3529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Baranov PV, Hammer AW, Zhou J, Gesteland RF, Atkins JF. Transcriptional slippage in bacteria: distribution in sequenced genomes and utilization in IS element gene expression. Genome Biol. 2005;6:R25. doi: 10.1186/gb-2005-6-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kircher M, Heyn P, Kelso J. Addressing challenges in the production and analysis of illumina sequencing data. BMC Genomics. 2011;12:382. doi: 10.1186/1471-2164-12-382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Schmitt MW, Kennedy SR, Salk JJ, Fox EJ, Hiatt JB, Loeb LA. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl Acad. Sci. USA. 2012;109:14508–14513. doi: 10.1073/pnas.1208715109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kinde I, Wu J, Papadopoulos N, Kinzler KW, Vogelstein B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA. 2011;108:9530–9535. doi: 10.1073/pnas.1105422108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Jabara CB, Jones CD, Roach J, Anderson JA, Swanstrom R. Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc. Natl Acad. Sci. USA. 2011;108:20166–20171. doi: 10.1073/pnas.1110064108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ji JP, Loeb LA. Fidelity of HIV-1 reverse transcriptase copying RNA in vitro. Biochemistry. 1992;31:954–958. doi: 10.1021/bi00119a002. [DOI] [PubMed] [Google Scholar]
- 29.Hu WS, Hughes SH. HIV-1 reverse transcription. Cold Spring Harb. Perspect. Med. 2012;2 doi: 10.1101/cshperspect.a006882. a006882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chen-Harris H, Borucki MK, Torres C, Slezak TR, Allen JE. Ultra–deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs. BMC Genomics. 2013;14:96. doi: 10.1186/1471-2164-14-96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ramaswami G, Zhang R, Piskol R, Keegan LP, Deng P, O'Connell MA, Li JB. Identifying RNA editing sites using RNA sequencing data alone. Nat. Methods. 2013;10:128–132. doi: 10.1038/nmeth.2330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011;333:53–58. doi: 10.1126/science.1207018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Lin W, Piskol R, Tan MH, Li JB. Comment on “Widespread RNA and DNA sequence differences in the human transcriptome”. Science. 2012;335:1302. doi: 10.1126/science.1210419. author reply 1302. [DOI] [PubMed] [Google Scholar]
- 34.Kleinman CL, Majewski J. Comment on “Widespread RNA and DNA sequence differences in the human transcriptome”. Science. 2012;335:1302. doi: 10.1126/science.1209658. author reply 1302. [DOI] [PubMed] [Google Scholar]
- 35.Pickrell JK, Gilad Y, Pritchard JK. Comment on “Widespread RNA and DNA sequence differences in the human transcriptome”. Science. 2012;335:1302. doi: 10.1126/science.1210484. author reply 1302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Abram ME, Ferris AL, Shao W, Alvord WG, Hughes SH. Nature, position, and frequency of mutations made in a single cycle of HIV-1 replication. J. Virol. 2010;84:9864–9878. doi: 10.1128/JVI.00915-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kashlev M, Nudler E, Severinov K, Borukhov S, Komissarova N, Goldfarb A. Histidine-tagged RNA polymerase of Escherichia coli and transcription in solid phase. Methods Enzymol. 1996;274:326–334. doi: 10.1016/s0076-6879(96)74028-4. [DOI] [PubMed] [Google Scholar]
- 38.Borukhov S, Goldfarb A. Purification and assay of Escherichia coli transcript cleavage factors GreA and GreB. Methods Enzymol. 1996;274:315–326. doi: 10.1016/s0076-6879(96)74027-2. [DOI] [PubMed] [Google Scholar]
- 39.Kashlev MV, Bass IA, Lebedev AN, Kaliaeva ES, Nikiforov VG. [Deletion-insertion mapping of the region non-essential for functioning of the beta-subunit of Escherichia coli RNA polymerase] Genetika. 1989;25:396–405. [PubMed] [Google Scholar]
- 40.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kireeva ML, Lubkowska L, Komissarova N, Kashlev M. Assays and affinity purification of biotinylated and nonbiotinylated forms of double-tagged core RNA polymerase II from Saccharomyces cerevisiae. Methods Enzymol. 2003;370:138–155. doi: 10.1016/S0076-6879(03)70012-3. [DOI] [PubMed] [Google Scholar]
- 43.Komissarova N, Kireeva ML, Becker J, Sidorenkov I, Kashlev M. Engineering of elongation complexes of bacterial and yeast RNA polymerases. Methods Enzymol. 2003;371:233–251. doi: 10.1016/S0076-6879(03)71017-9. [DOI] [PubMed] [Google Scholar]
- 44.Imashimizu M, Kireeva ML, Lubkowska L, Gotte D, Parks AR, Strathern JN, Kashlev M. Intrinsic Translocation Barrier as an Initial Step in Pausing by RNA Polymerase II. J. Mol. Biol. 2013;425:697–712. doi: 10.1016/j.jmb.2012.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Sousa R. Structural and mechanistic relationships between nucleic acid polymerases. Trends Biochem. Sci. 1996;21:186–190. [PubMed] [Google Scholar]
- 46.Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998;8:186–194. [PubMed] [Google Scholar]
- 47.Grant GR, Farkas MH, Pizarro AD, Lahens NF, Schug J, Brunk BP, Stoeckert CJ, Hogenesch JB, Pierce EA. Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) Bioinformatics. 2011;27:2518–2528. doi: 10.1093/bioinformatics/btr427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Niyogi SK, Feldman RP. Effect of several metal ions on misincorporation during transcription. Nucleic Acids Res. 1981;9:2615–2627. doi: 10.1093/nar/9.11.2615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Walmacq C, Kireeva ML, Irvin J, Nedialkov Y, Lubkowska L, Malagon F, Strathern JN, Kashlev M. Rpb9 subunit controls transcription fidelity by delaying NTP sequestration in RNA polymerase II. J. Biol. Chem. 2009;284:19601–19612. doi: 10.1074/jbc.M109.006908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Imashimizu M, Tanaka K, Shimamoto N. Comparative study of cyanobacterial and E. coli RNA polymerases: misincorporation, abortive transcription, and dependence on divalent cations. Genet. Res. Int. 2011 doi: 10.4061/2011/572689. 2011, 572689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Borukhov S, Sagitov V, Goldfarb A. Transcript cleavage factors from E. coli. Cell. 1993;72:459–466. doi: 10.1016/0092-8674(93)90121-6. [DOI] [PubMed] [Google Scholar]
- 52.Petersen C, Moller LB. Invariance of the nucleoside triphosphate pools of Escherichia coli with growth rate. J. Biol. Chem. 2000;275:3931–3935. doi: 10.1074/jbc.275.6.3931. [DOI] [PubMed] [Google Scholar]
- 53.Orlova M, Newlands J, Das A, Goldfarb A, Borukhov S. Intrinsic transcript cleavage activity of RNA polymerase. Proc. Natl Acad. Sci. USA. 1995;92:4596–4600. doi: 10.1073/pnas.92.10.4596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Nakano T, Ouchi R, Kawazoe J, Pack SP, Makino K, Ide H. T7 RNA polymerases backed up by covalently trapped proteins catalyze highly error prone transcription. J. Biol. Chem. 2012;287:6562–6572. doi: 10.1074/jbc.M111.318410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Stepanova E, Lee J, Ozerova M, Semenova E, Datsenko K, Wanner BL, Severinov K, Borukhov S. Analysis of promoter targets for Escherichia coli transcription elongation factor GreA in vivo and in vitro. J. Bacteriol. 2007;189:8772–8785. doi: 10.1128/JB.00911-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Drake JW. A constant rate of spontaneous mutation in DNA-based microbes. Proc. Natl Acad. Sci. USA. 1991;88:7160–7164. doi: 10.1073/pnas.88.16.7160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Komissarova N, Kashlev M. Functional topography of nascent RNA in elongation intermediates of RNA polymerase. Proc. Natl Acad. Sci. USA. 1998;95:14699–14704. doi: 10.1073/pnas.95.25.14699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Komissarova N, Becker J, Solter S, Kireeva M, Kashlev M. Shortening of RNA:DNA hybrid in the elongation complex of RNA polymerase is a prerequisite for transcription termination. Mol. Cell. 2002;10:1151–1162. doi: 10.1016/s1097-2765(02)00738-4. [DOI] [PubMed] [Google Scholar]
- 60.Artsimovitch I, Chu C, Lynch AS, Landick R. A new class of bacterial RNA polymerase inhibitor affects nucleotide addition. Science. 2003;302:650–654. doi: 10.1126/science.1087526. [DOI] [PubMed] [Google Scholar]
- 61.Churchman LS, Weissman JS. Nascent transcript sequencing visualizes transcription at nucleotide resolution. Nature. 2011;469:368–373. doi: 10.1038/nature09652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–6100. doi: 10.1093/nar/18.20.6097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Walmacq C, Cheung AC, Kireeva ML, Lubkowska L, Ye C, Gotte D, Strathern JN, Carell T, Cramer P, Kashlev M. Mechanism of translesion transcription by RNA polymerase II and its role in cellular resistance to DNA damage. Mol Cell. 2012;46:1–12. doi: 10.1016/j.molcel.2012.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Kireeva ML, Opron K, Seibold SA, Domecq C, Cukier RI, Coulombe B, Kashlev M, Burton ZF. Molecular dynamics and mutational analysis of the catalytic and translocation cycle of RNA polymerase. BMC Biophys. 2012;5:11. doi: 10.1186/2046-1682-5-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Nedialkov YA, Opron K, Assaf F, Artsimovitch I, Kireeva ML, Kashlev M, Cukier RI, Nudler E, Burton ZF. The RNA polymerase bridge helix YFI motif in catalysis, fidelity and translocation. Biochim. Biophys. Acta. 2013;1829:187–198. doi: 10.1016/j.bbagrm.2012.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Saxowsky TT, Meadows KL, Klungland A, Doetsch PW. 8-Oxoguanine-mediated transcriptional mutagenesis causes Ras activation in mammalian cells. Proc. Natl Acad. Sci. USA. 2008;105:18877–18882. doi: 10.1073/pnas.0806464105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Alon S, Vigneault F, Eminaga S, Christodoulou DC, Seidman JG, Church GM, Eisenberg E. Barcoding bias in high-throughput multiplex sequencing of miRNA. Genome Res. 2011;21:1506–1511. doi: 10.1101/gr.121715.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Atkins JF, Bjork GR. A gripping tale of ribosomal frameshifting: extragenic suppressors of frameshift mutations spotlight P-site realignment. Microbiol. Mol. Biol. Rev. 2009;73:178–210. doi: 10.1128/MMBR.00010-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Dutta D, Shatalin K, Epshtein V, Gottesman ME, Nudler E. Linking RNA polymerase backtracking to genome instability in E. coli. Cell. 2011;146:533–543. doi: 10.1016/j.cell.2011.07.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







