Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations

Kate S Reid-Bayliss; Lawrence A Loeb

doi:10.1073/pnas.1709166114

. 2017 Aug 10;114(35):9415–9420. doi: 10.1073/pnas.1709166114

Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations

Kate S Reid-Bayliss ^a, Lawrence A Loeb ^a,^b,¹

PMCID: PMC5584456 PMID: 28798064

Significance

Epimutations arising from transcriptional mutagenesis have been hypothesized to contribute to viral and bacterial evolution, drug resistance, and age-related diseases, including cancer and neurodegeneration. However, methodology limitations have inhibited progress toward elucidating the contributions of epimutations to cellular evolution and survival in vivo. Recent efforts to overcome these limitations remain constrained by artifacts arising during RNA library preparation. We present accurate RNA consensus sequencing (ARC-seq), an accurate, high-throughput RNA sequencing method that effectively eliminates errors introduced during RNA library preparation and sequencing and represents a major advance over previous methods. ARC-seq will enable investigations of the causal roles of transcriptional fidelity and epimutations in multiple fields, including viral evolution, bacterial resistance, and age-related diseases, such as cancer and neurodegeneration.

Keywords: transcriptional mutagenesis, epimutations, RNA mutations, molecular misreading, RNAseq

Abstract

Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.

Infidelity during RNA transcription, termed transcriptional mutagenesis (TM), has long been hypothesized to contribute to aging (1) and age-associated diseases, including cancer (2, 3) and neurodegeneration (4, 5). RNA mutations resulting from TM, termed epimutations, have also been implicated in bacterial and viral evolution and resistance (6–8). Studies on RNA polymerases have revealed the fidelity of in vitro transcription by multiple RNA polymerases to be on the order of 10⁻⁵ epimutations per nucleotide (9–13). This rate can dramatically increase during transcription of damaged templates and certain sequence contexts, such as repetitive DNA (14). Additionally, in vivo assays have revealed that TM can result in phenotypic changes in nondividing (15) and dividing cells (16–19), with the potential for TM-induced phenotypic changes to be heritable (20, 21), indicating that a single mutant transcript has the potential to have profound effects on cellular function.

The bulk of the evidence for TM has been generated using in vitro fidelity assays and highly expressed reporter genes that encompass a small number of sequence contexts, are limited in the spectrum of mutations that can be monitored, and are subject to translational errors convoluting the results (9, 22). Consequently, the results of these studies cannot be easily extrapolated to understand the extent of epimutations in cells, where transcription factors, repair enzymes, chromatin, and gene expression levels modulate transcriptional fidelity. Thus, to elucidate the roles of TM-induced epimutations in physiology, disease, and evolution, it is necessary to study individual RNA molecules transcribed in vivo in a high-throughput manner.

De novo epimutations remain a challenging target for high-throughput RNA sequencing (RNAseq). While in vitro studies estimate RNA polymerase infidelity to be on the order of one in 100,000 epimutations per nucleotide, reverse transcriptase used to generate cDNA from RNA makes approximately one error per 10,000 bases (23). Additionally, Illumina sequencers misread approximately one in 1,000 bases (24). Recent methods, such as barcoding of RNAs (25) or cDNAs (26, 27), reduce the error frequency of RNAseq. However, such methods can be of low yield (25), rely heavily on complicated bioinformatics requiring calibration for each sample (26), and do not address errors introduced during reverse transcription (26, 27). Reverse transcriptase errors can be overcome by generating multiple cDNA copies from each RNA molecule (25, 28). However, these methods can be of low yield (25), may themselves introduce errors due to harsh reaction conditions (29), and are limited by sequence read length (28). To date, these advances have proven useful for sequencing viral RNA genomes, which are inherently more error-prone, but their background errors remain too high to reliably detect TM-induced epimutations in cells.

To address the limitations of RNAseq and enable the study of epimutations in any organism, we have developed a highly accurate sequencing method, termed accurate RNA consensus sequencing (ARC-seq), to measure epimutations with single-molecule sensitivity. ARC-seq uniquely couples the use of an adaptor to barcode each RNA molecule and the generation of multiple cDNA copies per RNA molecule before sequencing. This combination enables the removal of artifacts due to cDNA synthesis, PCR errors, and sequencing errors, revealing the epimutations resulting from TM in vivo.

Results

Development of a Highly Accurate Method to Detect Epimutations.

Three obstacles to accurate RNAseq include the following: (i) RNA must first undergo the highly error-prone process of reverse transcription before sequencing, (ii) PCR amplification of cDNA can introduce errors, and (iii) high-throughput sequencing itself is highly error-prone. To overcome these obstacles, we developed ARC-seq, a cDNA library preparation protocol. We start by ligating barcoded RNA adaptors to the 5′-end of fragmented RNA molecules; this adaptor contains 16 random nucleotides that uniquely identify individual RNA molecules (Fig. 1A). Each barcoded RNA molecule is then circularized and reverse-transcribed via rolling-circle reverse transcription. This produces a cDNA multimer containing multiple cDNA copies of the original RNA molecule. After restricting the multimeric cDNA molecule into monomers, we uniquely index each cDNA copy of the original RNA. Each indexed cDNA is then amplified by high-fidelity PCR and sequenced on an Illumina HiSeq instrument. After sequencing, using bioinformatics, the cDNA indexes are used to generate a PCR consensus sequence, eliminating artifacts due to sequencing and PCR errors (Fig. 1B). Finally, the RNA barcode is used to generate a cDNA consensus sequence, eliminating reverse transcription and damage-induced artifacts; thus, we are able to regenerate the original RNA sequence.

Fig. 1. — (A) Overview of the ARC-seq method. (i) Each RNA is ligated to an adaptor containing a unique barcode. Ligated RNAs are then circularized (ii) and subjected to rolling-circle reverse transcription (*iii*), generating a multimeric cDNA from each RNA molecule. (iv) cDNA multimers are then restricted into monomers, which are cDNA copies of the original RNA molecule. Each cDNA is then tagged with a unique index (v), amplified (vi), and sequenced. (B) Error correction by ARC-seq. (i) Single RNA molecule containing a true epimutation (red); this molecule is barcoded. (ii) Rolling-circle reverse transcription generates multiple cDNA copies from each ligated RNA molecule, introducing random errors (orange). (*iii*) Amplification and sequencing amplify the existing errors and introduce new errors (purple), further obscuring the true epimutation. Artifacts present in standard RNAseq data are illustrated at this level. (iv) After sequencing, cDNA tags are bioinformatically matched and a consensus sequence is generated for each cDNA copy, eliminating many amplification and sequencing artifacts. (v) Finally, the RNA barcodes are matched, and a consensus sequence is generated from the cDNA copies, which regenerates the original RNA molecule’s sequence, revealing the true epimutation. (C) ARC-seq eliminates damage-induced, reverse transcription, PCR, and sequencing artifacts, revealing true epimutations. High-fidelity (blue), damaged (green), and mutated (purple) RNAs were generated by in vitro transcription by T7 RNA polymerase and sequenced via ARC-seq. While conventional RNAseq has a high level of artifacts, with increased artifacts observed in the damaged RNA template, ARC-seq is able to fully correct damage-induced artifacts, revealing the true epimutation frequency to be ∼2 × 10⁻⁵, without removing true epimutations. Error bars represent Wilson scores of 95% confidence.

The upper estimate of next-generation sequencing error is one in 100 nucleotides sequenced (30); thus, the theoretical background of ARC-seq approaches 0.01ⁿ, where n is the number of cDNA copies produced from each RNA molecule. By increasing the length of the rolling-circle reverse transcription reaction, we can generate more cDNA copies per RNA molecule, thus increasing the stringency of the error correction or ARC-seq. This enables accurate sequencing of even highly damaged RNA molecules.

ARC-Seq Effectively Corrects Reverse Transcription, PCR, and Sequencing Artifacts.

To validate the power of ARC-seq to eliminate artifacts due to reverse transcription, PCR, and sequencing errors, we synthesized three types of RNAs by in vitro transcription, using T7 RNA polymerase (Fig. S1): (i) high-fidelity RNA, generated using a pristine DNA template [expected epimutation frequency of 3 × 10⁻⁵ (12)]; (ii) damaged RNA, generated by treating the high-fidelity RNA with hydrogen peroxide (H₂O₂) [expected epimutation frequency is the same as the high-fidelity RNA (3 × 10⁻⁵) because no new mutations are introduced]; and (iii) mutated RNA, which was generated from a DNA template oxidatively damaged with H₂O₂ to induce mistakes during transcription (expected to have an elevated epimutation frequency). These RNAs were then sequenced via ARC-seq. At a cDNA family size of one, which corresponds to RNAseq with tag-based error correction (e.g., ref. 27), the error frequency of the high-fidelity RNA is ∼2 × 10⁻⁴, ∼10-fold higher than the expected epimutation frequency (Fig. 1C); the error frequency of the damaged RNA template is elevated approximately threefold greater than the high-fidelity RNA, consistent with the high error rate of conventional RNAseq, especially on damaged RNA templates (23).

Fig. S1. — Generation of IVT RNA templates for validation of artifact elimination by ARC-seq. High-fidelity (blue), damaged (green), and mutated (purple) RNAs were generated by in vitro transcription by T7 RNA polymerase.

In contrast, by requiring five unique cDNA copies per RNA molecule, ARC-seq reveals the true epimutation frequency of the high-fidelity RNA to be ∼2 × 10⁻⁵. Furthermore, by requiring six cDNA copies per RNA molecule to form a consensus sequence, and therefore increasing the stringency of its error correction, ARC-seq fully corrects for damage-induced artifacts and reveals the true epimutation frequency of the damaged RNA to be equivalent to the undamaged high-fidelity RNA. In contrast, even with a high stringency of eight cDNA copies per RNA molecule, the mutation frequency of the mutated RNA remains more than 10-fold greater than the high-fidelity RNA, consistent with ARC-seq eliminating errors without mistakenly removing true epimutations. Thus, by repeatedly sequencing the same RNA molecule, ARC-seq eliminates damage-induced and sequencing artifacts, revealing the TM-induced epimutations present in the original RNA molecule.

ARC-Seq Reveals the Frequency and Spectrum of Epimutations in Vivo.

Several mutants of Saccharomyces cerevisiae (yeast) have been shown to have reduced in vitro RNA synthesis fidelity. Rpb1 E1103G is a point mutant of the catalytic domain of RNA polymerase II and confers dependence on transcription factor S-II (13). ΔRpb9 is a deletion mutant of a transcription factor that enhances the fidelity of mRNA transcription in yeast (31). To establish ARC-seq’s utility for measuring in vivo epimutations, we applied the method to study TM in these yeast mutants. When we analyze the epimutation frequencies obtained at increasing cDNA copy number per RNA molecule, we find that the mRNA and rRNA mutation frequencies of all three yeast strains plateau with just three cDNA copies per RNA molecule (Fig. 2A); the mRNA mutation frequency of stationary phase wild-type yeast is 4.21 × 10⁻⁵, more than an order of magnitude lower than the error frequency obtained with conventional RNAseq (Fig. 2B). Additionally, both RNA polymerase mutants have mRNA mutation frequencies elevated over wild type: 5.94 × 10⁻⁵ (P < 2.2 × 10⁻¹⁶) and 7.28 × 10⁻⁵ (P < 2.2 × 10⁻¹⁶) for E1103G and ΔRpb9, respectively. In contrast, consistent with the yeast mutants having error-prone RNA polymerase II transcription, the frequency of mutations in both mutants’ rRNAs, which are transcribed by RNA polymerases I and III, are not significantly different from the frequency of mutations in wild-type yeast. Furthermore, the mutation spectrums reveal differences between the types of mRNA mutations induced in the three yeast strains (Fig. 2C and Table S1). While C→U mutations are the most frequently observed epimutation in all three yeasts, both mutants show elevated frequencies of U→C, G→A, U→A, and C→A mutations, as well as single-base insertions, in their mRNAs relative to wild type. Additionally, E1103G has a greater elevation in U→C mutations than ΔRpb9 in its mRNA, whereas ΔRpb9 has greater elevations in G→A, U→A, and C→A mutations, relative to E1103G, in its mRNA. In contrast, in the rRNAs, no mutation subtype of either mutant differed significantly from wild type (Fig. 2D and Table S2), consistent with the defects of E1103G and ΔRpb9 being restricted to RNA polymerase II transcription.

Fig. 2. — ARC-seq reveals differences in epimutation frequencies and in the spectrum between yeast RNA polymerase mutants. RNAs from wild-type (WT), E1103G, and ΔRpb9 yeasts were sequenced via ARC-seq, with the number of cDNA copies per RNA molecule required to generate a consensus sequence varied from one through five. (A) Epimutation frequency stabilizes at three cDNAs per RNA molecule, revealing epimutation frequency differences between WT and the two mutants. (B) Comparison of epimutation frequencies observed with one cDNA copy per RNA molecule, corresponding to conventional RNAseq with tag-based error correction, and three cDNA copies per RNA molecule for mRNA (*Left*) and rRNA (*Right*). Differences between the mRNA (C) and rRNA (D) mutation spectrums of WT and mutant yeasts are shown. Error bars represent Wilson scores of 95% confidence. *P < 0.01, **P < 10⁻⁵, ***P < 10⁻¹⁰, ****P < 10⁻¹⁵.

Table S1.

mRNA mutation frequencies by mutation type of WT and mutant (E1103G and ▵RPB9) yeast

Mutation type	RNA mutation frequency^*			P value^†
Mutation type	WT	E1103G	▵RPB9	WT vs. E1103G	WT vs. ▵RPB9	E1103G vs. ▵RPB9
Transitions
A→G	2.15E-06	3.00E-06	3.22E-06	—	—	—
U→C	8.39E-06	3.66E-05	1.80E-05	2.20E-16	2.55E-09	6.66E-14
G→A	4.81E-06	4.95E-05	9.32E-05	2.20E-16	2.20E-16	2.20E-16
C→U	1.32E-04	1.31E-04	1.49E-04	—	6.55E-03	1.59E-03
Transversions
G→C	1.79E-06	1.90E-06	3.83E-06	—	—	—
C→G	6.27E-07	6.47E-07	1.75E-06	—	—	—
A→C	2.08E-07	5.34E-07	4.20E-07	—	—	—
U→G	6.62E-07	5.30E-07	1.02E-06	—	—	—
A→U	6.24E-07	1.03E-06	1.40E-06	—	—	—
U→A	8.83E-07	3.53E-06	8.42E-06	2.20E-06	2.20E-16	3.39E-07
G→U	3.34E-05	2.97E-05	3.83E-05	—	—	2.66E-03
C→A	7.16E-07	2.21E-06	4.56E-06	3.47E-03	3.83E-07	4.97E-03
Indels
Insert	2.39E-06	3.88E-06	4.13E-06	6.41E-06	5.50E-05	—
Delete	2.19E-06	1.70E-06	1.68E-06	—	—	—

Open in a new tab

Mutation frequency is total RNA mutations over all nucleotides sequenced.

^{^†}

P values were calculated by the two-sample test for equality of proportions with continuity correction. P values >0.01 not shown for clarity.

Table S2.

rRNA mutation frequencies by mutation type of WT and mutant (E1103G and ▵RPB9) yeast

Mutation type	RNA mutation frequency^*			P value^†
Mutation type	WT	E1103G	▵RPB9	WT vs. E1103G	WT vs. ▵RPB9	E1103G vs. ▵RPB9
Transitions
A→G	4.05E-06	2.18E-06	4.79E-06	—	—	—
U→C	1.05E-05	1.34E-05	1.28E-05	—	—	—
G→A	8.92E-06	1.29E-05	1.87E-05	—	—	—
C→U	1.11E-04	1.22E-04	1.16E-04	—	—	—
Transversions
G→C	2.97E-06	3.00E-06	4.83E-06	—	—	—
C→G	2.09E-06	1.81E-06	8.58E-07	—	—	—
A→C	6.75E-07	4.36E-07	0.00E+00	—	—	—
U→G	3.76E-06	1.34E-06	2.43E-06	—	—	—
A→U	2.02E-06	5.23E-06	8.99E-06	—	—	—
U→A	3.01E-06	2.23E-06	5.48E-06	—	—	—
G→U	3.05E-05	2.66E-05	3.56E-05	—	—	—
C→A	0.00E+00	1.81E-06	1.72E-06	—	—	—
Indels
Insert	2.35E-06	2.70E-06	4.40E-06	—	—	—
Delete	3.32E-06	1.76E-06	5.05E-06	—	—	7.71E-04

Open in a new tab

Mutation frequency is total RNA mutations over all nucleotides sequenced.

^{^†}

P values were calculated by the two-sample test for equality of proportions with continuity correction. P values >0.01 not shown for clarity.

Oxidative Stress Induces TM in Vivo.

DNA damage due to oxidative stress is well known to induce DNA mutations, and in vitro studies of RNA polymerase activity at DNA lesions indicate that it behaves similar to DNA polymerases (14). Thus, to determine if oxidative stress induces elevated TM in vivo, we treated log-phase wild-type yeast with 50 μM H₂O₂ for 30 min, extracted the RNAs, and sequenced them via ARC-seq. Following oxidative stress, the mRNA mutation frequency increases from 5.6 × 10⁻⁵ to 1.3 × 10⁻⁴ (Fig. 3A). While oxidative stress induces elevations in multiple mutation subtypes, the most frequent changes observed are G→A and U→G substitutions, induced 80-fold, and C→A substitutions, induced 164-fold (Fig. 3B and Table S3). In rRNA, nearly all mutation subtypes increase following oxidative stress, with the most frequent change again being C→A substitutions (Fig. 3C and Table S4), induced 217-fold. The dramatic increases in C→A mutations are consistent with TM of the 8-oxodG lesion in template DNA, which is the most common form of oxidative DNA damage in cells (32–35). Additionally, the large increase in G→A mutations in mRNA is consistent with TM across from deaminated cytosines in the DNA template, a common consequence of oxidative stress in cells (34, 36).

Fig. 3. — ARC-seq reveals differences in TM after oxidative stress between yeast RNA fidelity mutants and between RNA types. Wild-type yeast was exposed to H₂O₂, and its RNAs were then sequenced via ARC-seq. (A) Comparison of mRNA mutation frequencies observed with one cDNA copy per RNA molecule, corresponding to conventional RNAseq with tag-based error correction, and with three cDNA copies per RNA molecule. (B) mRNA frequency and spectrum in untreated (ctrl) and 50 μM-treated (H₂O₂) yeasts. (C) Frequency and spectrum of rRNA in ctrl and 50 μM-treated (H₂O₂) yeasts. Error bars represent Wilson scores of 95% confidence. *P < 0.05, **P < 10⁻², ***P < 10⁻⁵, ****P < 10⁻¹⁰, *****P < 10⁻¹⁵.

Table S3.

Mutation types induced in mRNA upon oxidative stress treatment

Mutation type	mRNA mutation frequency^*		Fold change	P value^†
Mutation type	Ctrl	H₂O₂	Fold change	P value^†
Transitions
A→G	3.34E-06	9.35E-06	2.8	—
U→C	7.87E-06	1.27E-05	1.6	—
G→A	1.93E-06	1.54E-04	80	2.20E-16
C→U	2.01E-04	2.31E-04	1.2	—
Transversions
G→C	2.23E-06	1.85E-05	8.3	5.23E-04
C→G	4.60E-06	1.26E-05	2.7	—
A→C	7.84E-07	1.40E-05	18	1.16E-07
U→G	5.32E-08	4.24E-06	80	4.15E-04
A→U	6.79E-07	0.00E+00	—	—
U→A	4.26E-07	1.27E-05	30	3.87E-12
G→U	2.61E-05	3.09E-05	1.2	—
C→A	3.34E-07	5.47E-05	164	2.20E-16
Indels
Insert	1.88E-06	5.89E-06	3.1	0.02
Delete	2.21E-06	5.89E-06	2.7	—

Open in a new tab

Mutation frequency is total RNA mutations over all nucleotides sequenced.

^{^†}

P values were calculated by the two-sample test for equality of proportions with continuity correction. P values >0.05 not shown for clarity.

Table S4.

Mutation types induced in rRNA upon oxidative stress treatment

Mutation type	rRNA mutation frequency^*		Fold change	P value^†
Mutation type	Ctrl	H₂O₂	Fold change	P value^†
Transitions
A→G	3.80E-06	1.86E-05	4.9	4.19E-07
U→C	4.49E-06	2.47E-05	5.5	1.95E-10
G→A	3.02E-06	6.41E-05	21	2.20E-16
C→U	5.36E-05	2.09E-04	3.9	2.20E-16
Transversions
G→C	2.26E-06	1.25E-05	5.5	7.98E-06
C→G	3.48E-06	1.02E-05	2.9	—
A→C	8.41E-07	1.86E-06	2.2	—
U→G	4.59E-07	1.14E-05	25	2.20E-16
A→U	1.27E-06	2.05E-05	16	2.20E-16
U→A	7.91E-07	1.33E-05	17	2.20E-16
G→U	1.76E-05	8.01E-05	4.6	2.20E-16
C→A	2.81E-07	6.11E-05	217	2.20E-16
Indels
Insert	9.01E-07	3.47E-06	3.9	7.24E-04
Delete	1.46E-06	8.43E-06	5.8	3.22E-14

Open in a new tab

Mutation frequency is total RNA mutations over all nucleotides sequenced.

^{^†}

P values were calculated by the two-sample test for equality of proportions with continuity correction. P values >0.05 not shown for clarity.

Discussion

TM is hypothesized to play roles in aging, cancer, neurodegeneration, viral evolution, and drug resistance (14, 37–39). However, little progress has been made in elucidating the contribution of TM to human health and disease, because methods for detecting epimutations in vivo have been limiting. The requirement to reverse-transcribe RNA before sequencing, as well as the high error rate of next-generation sequencing itself, constrains the accuracy of conventional RNAseq. Recent efforts to overcome these limitations have made progress toward more accurate RNA sequencing (25–29, 40, 41) but do not adequately remove the artifacts arising from these sources of error.

In developing ARC-seq, we reasoned that by generating multiple cDNA copies per RNA molecule, we could markedly reduce reverse transcription, PCR, and sequencing errors. We used a molecular barcode strategy to uniquely identify each RNA molecule before generating and sequencing multiple cDNA copies of each original RNA molecule. Furthermore, to distinguish between cDNA duplicates of a single RNA molecule and PCR duplicates of a single cDNA copy, and thereby eliminate PCR errors, we introduced an additional index sequence to each cDNA molecule. This unique combination enables the elimination of artifacts due to cDNA synthesis, PCR errors, and sequencing errors, revealing the sequence of the original RNA molecule.

Applying ARC-seq to sequencing in vitro-transcribed (IVT) RNAs demonstrates its unique ability to modulate the stringency of the method’s error correction. This increased stringency selectively eliminates artifacts and enables even highly damaged RNAs to be sequenced accurately, which will likely prove crucial to many in vivo applications, such as analyses of biopsies or postmortem tissues, where RNAs may be partially degraded and highly damaged.

Applying ARC-seq to the study of RNAs generated in vivo by yeast RNA polymerase mutants demonstrates its ability to sensitively detect mutation frequency and spectrum differences. While several mutation types are elevated in the mRNA of both mutants, relative to wild type, there are no elevations in rRNA mutations in either mutant. These results not only confirm the specificity of the yeast mutants’ defects in the fidelity of RNA polymerase II transcription but also serve as confirmation that ARC-seq accurately reveals epimutations. Importantly, we determined the frequency of each mutation type by the number of mutations observed over the total number of observations of the wild-type nucleotide; therefore, the differences observed are not due to differences in nucleotide distribution between the three strains. Of note, while not differing dramatically between the three yeast strains, C→U is the most frequent mutation observed; an unknown fraction of these mutations could be the result of deamination, either spontaneously or due to the action of cytosine deaminases on RNA rather than transcriptional infidelity. Further cell-based studies altering the expression of various cytosine deaminases could elucidate the extent to which TM versus RNA deamination contributes to C→U mutations in RNA.

Finally, applying ARC-seq to the study of the transcriptional mutagenic consequences of oxidative stress demonstrates its utility for addressing important biological questions. We see that oxidative stress induces high levels of epimutations not only in mRNA but also in rRNA. These results suggest that oxidative DNA damage, whether due to exogenous agents or endogenous perturbations, could have profound yet unappreciated consequences for cells. Of interest, the untreated rRNA mutation frequency of wild-type yeast was approximately twofold lower than its mRNA mutation frequency, largely due to decreased C→U mutations. Two possible explanations may account for this difference: (i) rDNA may be more readily repaired than protein-coding gene regions in the genome or (ii) the fidelity of rRNA synthesis is higher than the fidelity of mRNA synthesis. Given that rRNA is longer lived and involved in protein translation, either of these possibilities has merit. While a mutated mRNA may be translated multiple times, yielding a pool of mutant proteins, codon redundancy limits the impact of an individual mutation, and even if a codon change results, there is still only that one protein species affected by the TM event. In contrast, a mutated rRNA could disrupt the function or fidelity of the ribosome, potentially creating many more mutant proteins, which would be a worse consequence for the cell. Therefore, rDNA genome regions may be more closely protected against the persistence of DNA damage, or RNA polymerases I and III may have higher fidelity than RNA polymerase II. Further studies combining measurement of DNA damage distribution coupled with TM studies of rRNA and mRNA may help distinguish between these possibilities.

An important consideration in studying TM using ARC-seq is the scale of study desired. While we herein presented whole-transcriptome data, such a broad view may not always be desired or feasible. Two potential modifications to ARC-seq are possible that enable focusing on specific loci. First, one could enrich transcript regions of interest after ARC-seq library preparation, before sequencing, via either via single (42) or double capture (43). Such methods have been instrumental in enabling studies of small genomic regions in mammalian systems and would be an easy addition to ARC-seq; however, they require gene capture sets and additional steps after the initial library preparation. The second option is to use transcript-specific primers (44) during rolling-circle reverse transcription instead of the primer against the RNA adaptor. Transcript-specific reverse transcription represents a minor modification to ARC-seq as presented and would enable targeting of specific transcripts with only a primer, greatly minimizing the expense of targeting, relative to the capture approach.

Conclusions

We have developed a highly accurate RNA sequencing method that effectively eliminates artifacts due to reverse transcription, PCR, and sequencing errors. ARC-seq represents a major advance over previous methods. First, the method itself uses low temperatures, neutral solutions, and short incubations whenever possible, thereby minimizing the damage to the RNA template that limits other methods (28, 29). Next, it reliably generates multiple cDNA copies from each RNA molecule with high yield, a significant advance over prior attempts that were significantly limited by low yields (25). This high-yield cDNA copy generation also accounts for its scalable stringency, which enables highly accurate detection of TM-induced epimutations even from highly damaged sources. Finally, because it is not limited by sequence length, ARC-seq can be applied to any sample, without limitations on accuracy; with minor modifications, it could be used to look at transcriptome-wide TM, as we have demonstrated, or gene-specific TM to drive studies of the role of epimutations at specific loci in disease processes.

The accuracy, sensitivity, and scalable stringency of ARC-seq make it advantageous for application to numerous biological questions that have remained intractable to date. Future studies of TM in model systems, such as the yeast mutants studied here, could explore how perturbing or enhancing various aspects of transcription, including transcription factors or the nucleotide pool, affects transcriptional fidelity (22). Such studies could perhaps not only provide greater insight into the basic biology of transcription but also potentially lead to studies on how perturbing the transcriptional apparatus may potentially be useful as a therapeutic target in cancer and microbial diseases (2, 6, 45). Applying ARC-seq to studies of RNA viral populations could provide greater insight into the nature of quasispecies and how viral populations evolve and under which conditions, and potentially provide insight into how to prevent therapeutic resistance or even directly manipulate viral transcriptional apparati to induce lethal mutagenesis (7, 8). Additionally, applying ARC-seq to studies of TM in aging and neurodegeneration could elucidate whether or not epimutations underlie the pathologies of age-related disease, such as sporadic Alzheimer’s disease (4, 15) and cancer (6, 20), and, finally, address the long-standing hypothesis of protein synthesis errors driving aging and disease (37–39).

Methods

IVT RNAs were generated from a single-stranded m13mp18 DNA template via an established protocol (46), using T7 RNA polymerase. To generate damaged IVT RNA, following transcription, the high-fidelity RNA was treated with 100 μM H₂O₂ and FeCl₃ to induce oxidative DNA damage, according to an established protocol (47). To generate mutated IVT RNA, the m13mp18 DNA template was treated with 1 mM H₂O₂ before transcription.

Wild-type yeast and E1103G yeast were a gift from Mikhail Kashlev at the NIH/National Cancer Institute (NCI), Bethesda, and ΔRpb9 yeast was a gift from Jeffrey Strathern at the NIH/NCI. To measure TM in yeast, log-phase yeast or stationary-phase yeast was pelleted, washed with cold 1× PBS, and repelleted. The cell walls were then digested by incubating cells in a buffer containing sorbitol and 100 units of Zymolyase, according to an established protocol (48). RNAs were then extracted, enriching for mRNA, using the Dynabead mRNA Direct Kit from Ambion. Extracted RNAs were stored in 10 mM Tris and 0.1 mM EDTA buffer (pH 8.0) made with diethyl pyrocarbonate (DEPC)-treated nuclease-free water, with 100 units of murine RNase inhibitor from New England Biolabs (NEB) added, at −80 °C.

RNA Library Preparation.

RNA libraries were prepared via the ARC-seq protocol, as detailed in SI Materials and Methods. Briefly, fragmented RNAs were end-repaired, preadenylated, and ligated to ARC-seq adaptors. Adapted RNAs were circularized and then subjected to rolling-circle reverse transcription to generate multimeric cDNAs. The cDNA multimers were restricted into cDNA monomers, each of which was subsequently indexed via 5′-overhang extension PCR. Indexed cDNA monomers were amplified and sequenced on an Illumina HiSeq 2200 instrument, using the dual-indexing protocol.

Data Processing.

Reads were filtered for those containing properly located tag sequences, and the 16-nt RNA barcode and 8-nt cDNA index were combined to create a 24-nt tag for each read. Reads containing identical tag sequences were grouped together to form PCR consensus reads. PCR consensus reads sharing identical 16-nt RNA barcodes were then grouped together to form cDNA consensus families. The cDNA consensus for any position is considered undefined if the position is represented by fewer than n instances in the family or if less than 70% of the sequences at that position in the read are in agreement; n represents the number of cDNA copies generated from each RNA molecule and can be adjusted to increase assay stringency if the RNA template is damaged. Further details are provided in SI Materials and Methods.

SI Materials and Methods

ARC-Seq Protocol for RNA Library Preparation.

A 9-μL aliquot of each RNA was fragmented using Ambion RNA Fragmentation Reagents, per the manufacturer’s protocol. End-repair of the resulting fragmented RNA was accomplished in two steps: (i) dephosphorylation of the 5′- and 3′-ends by shrimp alkaline phosphatase (NEB) and (ii) 5′-phosphorylation by OptiKinase, a polynucleotide kinase (PNK) mutant (Affymetrix), both per the manufacturers’ protocols. A concurrent DNase I treatment was carried out during the 5′-phosphorylation reaction to eliminate any residual DNA contamination. RNA fragments larger than the optimal range of 150–300 nt were removed by adding 1.5 vol of AMPure RNA CleanXP beads (Agencourt) to end-repaired RNA and transferring the supernatant to a separate tube. The beads were washed with 80% ethanol and RNA-eluted in TElow [10 mM Tris⋅HCl (pH 8.0) and 0.1 mM EDTA], according to the manufacturer’s protocol. RNA (50–200 fmol) was then 5′-adenylated using Mth ligase from NEB, according to the manufacturer’s protocol, and purified with 1.5 vol of AMPure RNA Clean XP beads. The 5′-adenylated RNA was ligated to the 3′-end of the custom RNA adaptor obtained from IDT (ARC-seq adaptor: 5′-NNNNNNNNNNNNNNNNAGAUCGGAAGAGCACACGUCUGAACUCCAGUCACGGCGCGCCCCUACACGACGCUCUUCCGAUCUGA-3′, where NNNNNNNNNNNNNNNN represents a 16-nt random barcode) at a molar ratio of 4:1 (adaptor/RNA), using NEB’s RNA ligase 2 and truncated KQ mutant incubated at 16 °C for 16–18 h, and then purified using the RNA Clean and Concentrator Kit (Zymo Research), according to the manufacturer’s protocol.

The ARC-seq adaptor is not 5′-phosphorylated, which minimizes self-ligation during adaptor/RNA ligation. To enable circularization, the adapted RNA was 5′-phosphorylated using T4 PNK (NEB), per the manufacturer’s protocol, and cleaned with 1 vol of AMPure RNA Clean XP beads. The 5′-phosphorylated adapted RNA was circularized RNA ligase 1 (NEB), incubated at 37 °C for 1 h and then at 16 °C for 16–18 h, and then purified with 1 vol of AMPure RNA Clean XP beads. Circularized RNA was then converted to cDNA multimers via rolling-circle reverse transcription, using ProtoScript RT (NEB), and incubated at 42 °C for 20–24 h. Multimeric cDNA was size-selected with 0.6 vol of AMPure XP beads, washed, and eluted. This selected for cDNAs greater than 800 nt and against cDNAs generated by a reverse transcriptase circumnavigating a circularized RNA less than three times; after this step, each multimeric cDNA contained three or more cDNA copies of a single RNA molecule. Multimeric cDNA was then digested into cDNA monomers by annealing a primer (5′-TGAACTCCAGTCACGGCGCGCCCCTACACGACGCT-3′) to the cDNA multimer and restricting with AscI (NEB), per the manufacturer’s protocol; this was repeated once to ensure complete restriction. The cDNA monomers were purified with 1 vol of AMPure XP beads.

Each cDNA monomer was then indexed using KAPA HiFi DNA polymerase (KAPA Biosystems) to tail-in a PCR-ID (5′-AATGATACGGCGACCACCGAGATCTTTCANNNNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTGA-3′, where NNNNNNNNNNN represents an 11-nt random barcode), per the manufacturer’s protocol, but only running three heating cycles. This indexing reaction uniquely barcoded each cDNA monomer to enable discrimination between PCR duplicates and cDNA duplicates of original RNA molecules, which was crucial to generate cDNA families for each unique RNA molecule. After being cleaned with 0.8 vol of AMPure XP beads, indexed DNA/cDNA was amplified by real-time PCR with KAPA HiFi DNA polymerase and Illumina primers (P5: 5′-AATGATACGGCGACCACCGAGA-3′; and P7: 5′-GTGACTGGAGTTCAGACGTGTGC-3′). Starting with 50–200 fmol of end-repaired RNA yielded sufficient DNA to amplify to optical detection by 11–13 PCR cycles. Amplified DNA was purified with 0.8 vol of AMPure XP beads.

For multiplexing sequencing runs, each DNA library was amplified for six cycles using KAPA HiFi DNA polymerase and Illumina primers (P5: 5′-AATGATACGGCGACCACCGAGA-3′; and MWS-20: 5′-CAAGCAGAAGACGGCATACGAGATXXXXXXGTGACTGGAGTTCAGACGTGTGC-3′, where XXXXXX indicates the position of a fixed multiplexing barcode sequence). Following PCR amplification, the adapters contain all flow-cell and sequencing primer binding sites required for the Illumina TruSeq system. DNA sequencing was then performed on the Illumina HiSeq 2000 system, per the manufacturer’s recommendations, using the paired-end, dual-indexing protocol.

Data Processing.

Bioinformatic processing of the resulting raw sequence data combines publicly available tools, as well as software written in-house (available upon request). All raw sequencing reads are filtered based on the requirement that each position of the 16-nt RNA barcode in read 4 and the 8-nt cDNA tag in read 3 be only one of the four canonical bases; any reads not conforming to this criterion are discarded. The 16-nt RNA barcode sequence from read 4 and the 8-nt cDNA barcode sequence from read 3 were computationally added to the read header to result in a combined 24-nt tag for each read pair. Because the first three cycles of read 3 are dark, the original 11-nt N-mer results in an 8-nt cDNA barcode. The reads were then aligned to the reference genome with the Burrows–Wheeler Aligner (BWA) (49), and nonmapping reads were discarded. The m13mp18 reference genome was used as a reference genome for our m13 in vitro transcription experiments, and the S228C yeast reference genome was used for our yeast in vivo transcription experiments. Reads sharing identical 24-nt tag sequences were then grouped together and collapsed into PCR consensus reads of individual cDNA molecules. While no requirement for minimal family members was required to form a PCR consensus, if multiple family members were present, each position was considered undefined if less than 99% of the sequences at that position were in agreement. This ensured that errors introduced during PCR and sequencing were eliminated. The PCR consensus reads were then realigned to the reference genome using the BWA, again filtering for unmapped reads resulting from too many undefined positions in PCR consensus reads.

PCR consensus reads sharing identical 16-nt RNA barcodes were then grouped together and collapsed into cDNA consensus families. Sequencing positions were discounted if the consensus family covering that position consisted of fewer than n members or if less than 70% of the sequences at the position had the identical sequence; n represents the number of cDNA copies generated from each RNA molecule, and could be increased or decreased to adjust the stringency of the error correction of the assay. For example, in our m13 experiment, to represent conventional RNAseq, we set n = 1, whereas ARC-seq was set at n = 3 for high-fidelity RNA and n = 6 for damaged RNA. This removes errors introduced during reverse transcription. The cDNA consensus reads were then realigned to the reference genome using the BWA, again filtering for unmapped reads resulting from too many undefined positions in the cDNA consensus families. Finally, alignment artifacts are removed by soft-clipping read ends using the Genome Analysis Toolkit, and mutation frequencies are determined by calculating the number of RNA mutations divided by the total number of nucleotides sequenced for each RNA subtype For mRNA mutations, only those RNA mutations observed within mRNA-mapping sequences are counted; likewise for rRNA mutations. An overview of the work flow is provided below.

Overview of ARC-Seq Data Processing.

i)
Combine the 16-nt tag from read 4 and the 8-nt tag from read 3, and append to the read header.
ii)
Align reads to the reference genome, and discard nonmapping reads.
iii)
Group together reads that have identical 24-nt tags, representing PCR duplicates of individual cDNA molecules.
iv)
Collapse PCR duplicates into cDNA consensus families, scoring only positions having >99% sequence identity among the duplicates.
v)
Realign reads to the reference genome.
vi)
Group together reads whose first 16 nt of their 24-nt tags are identical, representing cDNA duplicates of individual RNA molecules.
vii)
Collapse cDNA duplicates into RNA consensus families, scoring only positions represented by n or more cDNA duplicates and having >70% sequence identity among the duplicates, where n is the number of cDNA duplicates determined necessary to eliminate reverse transcriptase errors.
viii)
Realign reads to the reference genome.
ix)
Trim read ends to remove alignment errors.
x)
Score mutations.

Acknowledgments

We thank Scott Kennedy for assistance with bioinformatics; Edward Fox, Scott Kennedy, Gwen Garden, Peter Rabinovitch, and Alan Weiner for helpful discussions; and Tom Walsh and Ming Lee for assistance with sequencing. This work was supported by NIH NCI Grants P01-CA77852 and R01-CA160674 (to L.A.L.) and National Institute on Aging Grant T32 AG000057 (to K.S.R.-B).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The raw sequence files reported in this paper have been deposited in the National Center for Biotechnology Information’s Sequence Read Archive (BioProject accession no. PRJNA396053).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1709166114/-/DCSupplemental.

References

1.Paoloni-Giacobino A, Rossier C, Papasavvas MP, Antonarakis SE. Frequency of replication/transcription errors in (A)/(T) runs of human genes. Hum Genet. 2001;109:40–47. doi: 10.1007/s004390100541. [DOI] [PubMed] [Google Scholar]
2.Rodin SN, Rodin AS, Juhasz A, Holmquist GP. Cancerous hyper-mutagenesis in p53 genes is possibly associated with transcriptional bypass of DNA lesions. Mutat Res. 2002;510:153–168. doi: 10.1016/s0027-5107(02)00260-9. [DOI] [PubMed] [Google Scholar]
3.Hubbard K, Catalano J, Puri RK, Gnatt A. Knockdown of TFIIS by RNA silencing inhibits cancer cell proliferation and induces apoptosis. BMC Cancer. 2008;8:133. doi: 10.1186/1471-2407-8-133. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.van Leeuwen FW, et al. Frameshift mutants of beta amyloid precursor protein and ubiquitin-B in Alzheimer’s and Down patients. Science. 1998;279:242–247. doi: 10.1126/science.279.5348.242. [DOI] [PubMed] [Google Scholar]
5.van Leeuwen FW, Burbach JP, Hol EM. Mutations in RNA: A first example of molecular misreading in Alzheimer’s disease. Trends Neurosci. 1998;21:331–335. doi: 10.1016/s0166-2236(98)01280-6. [DOI] [PubMed] [Google Scholar]
6.Morreall JF, Petrova L, Doetsch PW. Transcriptional mutagenesis and its potential roles in the etiology of cancer and bacterial antibiotic resistance. J Cell Physiol. 2013;228:2257–2261. doi: 10.1002/jcp.24400. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Vignuzzi M, Stone JK, Arnold JJ, Cameron CE, Andino R. Quasispecies diversity determines pathogenesis through cooperative interactions in a viral population. Nature. 2006;439:344–348. doi: 10.1038/nature04388. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Coffey LL, et al. Arbovirus evolution in vivo is constrained by host alternation. Proc Natl Acad Sci USA. 2008;105:6970–6975. doi: 10.1073/pnas.0712130105. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Blank A, Gallant JA, Burgess RR, Loeb LA. An RNA polymerase mutant with reduced accuracy of chain elongation. Biochemistry. 1986;25:5920–5928. doi: 10.1021/bi00368a013. [DOI] [PubMed] [Google Scholar]
10.Rosenberger RF, Hilton J. The frequency of transcriptional and translational errors at nonsense codons in the lacZ gene of Escherichia coli. Mol Gen Genet. 1983;191:207–212. doi: 10.1007/BF00334815. [DOI] [PubMed] [Google Scholar]
11.Ninio J. Connections between translation, transcription and replication error-rates. Biochimie. 1991;73:1517–1523. doi: 10.1016/0300-9084(91)90186-5. [DOI] [PubMed] [Google Scholar]
12.Remington KM, Bennett SE, Harris CM, Harris TM, Bebenek K. Highly mutagenic bypass synthesis by T7 RNA polymerase of site-specific benzo[a]pyrene diol epoxide-adducted template DNA. J Biol Chem. 1998;273:13170–13176. doi: 10.1074/jbc.273.21.13170. [DOI] [PubMed] [Google Scholar]
13.Kireeva ML, et al. Transient reversal of RNA polymerase II active site closing controls fidelity of transcription elongation. Mol Cell. 2008;30:557–566. doi: 10.1016/j.molcel.2008.04.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Doetsch PW. Translesion synthesis by RNA polymerases: Occurrence and biological implications for transcriptional mutagenesis. Mutat Res. 2002;510:131–140. doi: 10.1016/s0027-5107(02)00258-0. [DOI] [PubMed] [Google Scholar]
15.Viswanathan A, You HJ, Doetsch PW. Phenotypic change caused by transcriptional bypass of uracil in nondividing cells. Science. 1999;284:159–162. doi: 10.1126/science.284.5411.159. [DOI] [PubMed] [Google Scholar]
16.Brégeon D, Doddridge ZA, You HJ, Weiss B, Doetsch PW. Transcriptional mutagenesis induced by uracil and 8-oxoguanine in Escherichia coli. Mol Cell. 2003;12:959–970. doi: 10.1016/s1097-2765(03)00360-5. [DOI] [PubMed] [Google Scholar]
17.Pastoriza-Gallego M, Armier J, Sarasin A. Transcription through 8-oxoguanine in DNA repair-proficient and Csb(-)/Ogg1(-) DNA repair-deficient mouse embryonic fibroblasts is dependent upon promoter strength and sequence context. Mutagenesis. 2007;22:343–351. doi: 10.1093/mutage/gem024. [DOI] [PubMed] [Google Scholar]
18.Saxowsky TT, Meadows KL, Klungland A, Doetsch PW. 8-Oxoguanine-mediated transcriptional mutagenesis causes Ras activation in mammalian cells. Proc Natl Acad Sci USA. 2008;105:18877–18882. doi: 10.1073/pnas.0806464105. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Burns JA, Dreij K, Cartularo L, Scicchitano DA. O6-methylguanine induces altered proteins at the level of transcription in human cells. Nucleic Acids Res. 2010;38:8178–8187. doi: 10.1093/nar/gkq706. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Gordon AJ, Satory D, Halliday JA, Herman C. Heritable change caused by transient transcription errors. PLoS Genet. 2013;9:e1003595. doi: 10.1371/journal.pgen.1003595. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Gordon AJ, et al. Transcriptional infidelity promotes heritable phenotypic change in a bistable gene network. PLoS Biol. 2009;7:e44. doi: 10.1371/journal.pbio.1000044. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Strathern JN, Jin DJ, Court DL, Kashlev M. Isolation and characterization of transcription fidelity mutants. Biochim Biophys Acta. 2012;1819:694–699. doi: 10.1016/j.bbagrm.2012.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Ji JP, Loeb LA. Fidelity of HIV-1 reverse transcriptase copying RNA in vitro. Biochemistry. 1992;31:954–958. doi: 10.1021/bi00119a002. [DOI] [PubMed] [Google Scholar]
24.Minoche AE, Dohm JC, Himmelbauer H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 2011;12:R112. doi: 10.1186/gb-2011-12-11-r112. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Gout JF, Thomas WK, Smith Z, Okamoto K, Lynch M. Large-scale detection of in vivo transcription errors. Proc Natl Acad Sci USA. 2013;110:18584–18589. doi: 10.1073/pnas.1309843110. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Imashimizu M, Oshima T, Lubkowska L, Kashlev M. Direct assessment of transcription fidelity by high-resolution RNA sequencing. Nucleic Acids Res. 2013;41:9090–9104. doi: 10.1093/nar/gkt698. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Jabara CB, Jones CD, Roach J, Anderson JA, Swanstrom R. Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc Natl Acad Sci USA. 2011;108:20166–20171. doi: 10.1073/pnas.1110064108. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Acevedo A, Brodsky L, Andino R. Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature. 2014;505:686–690. doi: 10.1038/nature12861. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Acevedo A, Andino R. Library preparation for highly accurate population sequencing of RNA viruses. Nat Protoc. 2014;9:1760–1769. doi: 10.1038/nprot.2014.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Fox EJ, Reid-Bayliss KS, Emond MJ, Loeb LA. Accuracy of next generation sequencing platforms. Next Gener Seq Appl. 2014;1:106. doi: 10.4172/jngsa.1000106. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Walmacq C, et al. Rpb9 subunit controls transcription fidelity by delaying NTP sequestration in RNA polymerase II. J Biol Chem. 2009;284:19601–19612. doi: 10.1074/jbc.M109.006908. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Kasai H, Tanooka H, Nishimura S. Formation of 8-hydroxyguanine residues in DNA by X-irradiation. Gan. 1984;75:1037–1039. [PubMed] [Google Scholar]
33.Lee DH, Pfeifer GP. Translesion synthesis of 7,8-dihydro-8-oxo-2'-deoxyguanosine by DNA polymerase eta in vivo. Mutat Res. 2008;641:19–26. doi: 10.1016/j.mrfmmm.2008.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.De Bont R, van Larebeke N. Endogenous DNA damage in humans: A review of quantitative data. Mutagenesis. 2004;19:169–185. doi: 10.1093/mutage/geh025. [DOI] [PubMed] [Google Scholar]
35.Cooke MS, et al. Oxidative DNA damage: Mechanisms, mutation, and disease. FASEB J. 2003;17:1195–1214. doi: 10.1096/fj.02-0752rev. [DOI] [PubMed] [Google Scholar]
36.Lindahl T. DNA glycosylases, endonucleases for apurinic/apyrimidinic sites, and base excision-repair. Prog Nucleic Acid Res Mol Biol. 1979;22:135–192. doi: 10.1016/s0079-6603(08)60800-4. [DOI] [PubMed] [Google Scholar]
37.Orgel LE. The maintenance of the accuracy of protein synthesis and its relevance to ageing. Proc Natl Acad Sci USA. 1963;49:517–521. doi: 10.1073/pnas.49.4.517. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Orgel LE. The maintenance of the accuracy of protein synthesis and its relevance to ageing: A correction. Proc Natl Acad Sci USA. 1970;67:1476. doi: 10.1073/pnas.67.3.1476. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Martin GM, Bressler SL. Transcriptional infidelity in aging cells and its relevance for the Orgel hypothesis. Neurobiol Aging. 2000;21:897–900; discussion 903–904. doi: 10.1016/s0197-4580(00)00193-7. [DOI] [PubMed] [Google Scholar]
40.Zhou S, et al. Primer ID validates template sampling depth and greatly reduces the error rate of next-generation sequencing of HIV-1 genomic RNA populations. J Virol. 2015;89:8540–8555. doi: 10.1128/JVI.00522-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Traverse CC, Ochman H. Conserved rates and patterns of transcription errors across bacterial growth states and lifestyles. Proc Natl Acad Sci USA. 2016;113:3311–3316. doi: 10.1073/pnas.1525329113. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Mamanova L, et al. Target-enrichment strategies for next-generation sequencing. Nat Methods. 2010;7:111–118. doi: 10.1038/nmeth.1419. [DOI] [PubMed] [Google Scholar]
43.Schmitt MW, et al. Sequencing small genomic targets with high efficiency and extreme accuracy. Nat Methods. 2015;12:423–425. doi: 10.1038/nmeth.3351. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Frohman MA, Dush MK, Martin GR. Rapid production of full-length cDNAs from rare transcripts: Amplification using a single gene-specific oligonucleotide primer. Proc Natl Acad Sci USA. 1988;85:8998–9002. doi: 10.1073/pnas.85.23.8998. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Brégeon D, Doetsch PW. Transcriptional mutagenesis: Causes and involvement in tumor development. Nat Rev Cancer. 2011;11:218–227. doi: 10.1038/nrc3006. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Korencic D, Soll D, Ambrogelly A. A one-step method for in vitro production of tRNA transcripts. Nucleic Acids Res. 2002;30:e105. doi: 10.1093/nar/gnf104. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.McBride TJ, Preston BD, Loeb LA. Mutagenic spectrum resulting from DNA damage by oxygen radicals. Biochemistry. 1991;30:207–213. doi: 10.1021/bi00215a030. [DOI] [PubMed] [Google Scholar]
48.Klassen R, et al. A modified DNA isolation protocol for obtaining pure RT-PCR grade RNA. Biotechnol Lett. 2008;30:1041–1044. doi: 10.1007/s10529-008-9648-y. [DOI] [PubMed] [Google Scholar]
49.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r1] 1.Paoloni-Giacobino A, Rossier C, Papasavvas MP, Antonarakis SE. Frequency of replication/transcription errors in (A)/(T) runs of human genes. Hum Genet. 2001;109:40–47. doi: 10.1007/s004390100541. [DOI] [PubMed] [Google Scholar]

[r2] 2.Rodin SN, Rodin AS, Juhasz A, Holmquist GP. Cancerous hyper-mutagenesis in p53 genes is possibly associated with transcriptional bypass of DNA lesions. Mutat Res. 2002;510:153–168. doi: 10.1016/s0027-5107(02)00260-9. [DOI] [PubMed] [Google Scholar]

[r3] 3.Hubbard K, Catalano J, Puri RK, Gnatt A. Knockdown of TFIIS by RNA silencing inhibits cancer cell proliferation and induces apoptosis. BMC Cancer. 2008;8:133. doi: 10.1186/1471-2407-8-133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4] 4.van Leeuwen FW, et al. Frameshift mutants of beta amyloid precursor protein and ubiquitin-B in Alzheimer’s and Down patients. Science. 1998;279:242–247. doi: 10.1126/science.279.5348.242. [DOI] [PubMed] [Google Scholar]

[r5] 5.van Leeuwen FW, Burbach JP, Hol EM. Mutations in RNA: A first example of molecular misreading in Alzheimer’s disease. Trends Neurosci. 1998;21:331–335. doi: 10.1016/s0166-2236(98)01280-6. [DOI] [PubMed] [Google Scholar]

[r6] 6.Morreall JF, Petrova L, Doetsch PW. Transcriptional mutagenesis and its potential roles in the etiology of cancer and bacterial antibiotic resistance. J Cell Physiol. 2013;228:2257–2261. doi: 10.1002/jcp.24400. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7] 7.Vignuzzi M, Stone JK, Arnold JJ, Cameron CE, Andino R. Quasispecies diversity determines pathogenesis through cooperative interactions in a viral population. Nature. 2006;439:344–348. doi: 10.1038/nature04388. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r8] 8.Coffey LL, et al. Arbovirus evolution in vivo is constrained by host alternation. Proc Natl Acad Sci USA. 2008;105:6970–6975. doi: 10.1073/pnas.0712130105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9] 9.Blank A, Gallant JA, Burgess RR, Loeb LA. An RNA polymerase mutant with reduced accuracy of chain elongation. Biochemistry. 1986;25:5920–5928. doi: 10.1021/bi00368a013. [DOI] [PubMed] [Google Scholar]

[r10] 10.Rosenberger RF, Hilton J. The frequency of transcriptional and translational errors at nonsense codons in the lacZ gene of Escherichia coli. Mol Gen Genet. 1983;191:207–212. doi: 10.1007/BF00334815. [DOI] [PubMed] [Google Scholar]

[r11] 11.Ninio J. Connections between translation, transcription and replication error-rates. Biochimie. 1991;73:1517–1523. doi: 10.1016/0300-9084(91)90186-5. [DOI] [PubMed] [Google Scholar]

[r12] 12.Remington KM, Bennett SE, Harris CM, Harris TM, Bebenek K. Highly mutagenic bypass synthesis by T7 RNA polymerase of site-specific benzo[a]pyrene diol epoxide-adducted template DNA. J Biol Chem. 1998;273:13170–13176. doi: 10.1074/jbc.273.21.13170. [DOI] [PubMed] [Google Scholar]

[r13] 13.Kireeva ML, et al. Transient reversal of RNA polymerase II active site closing controls fidelity of transcription elongation. Mol Cell. 2008;30:557–566. doi: 10.1016/j.molcel.2008.04.017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r14] 14.Doetsch PW. Translesion synthesis by RNA polymerases: Occurrence and biological implications for transcriptional mutagenesis. Mutat Res. 2002;510:131–140. doi: 10.1016/s0027-5107(02)00258-0. [DOI] [PubMed] [Google Scholar]

[r15] 15.Viswanathan A, You HJ, Doetsch PW. Phenotypic change caused by transcriptional bypass of uracil in nondividing cells. Science. 1999;284:159–162. doi: 10.1126/science.284.5411.159. [DOI] [PubMed] [Google Scholar]

[r16] 16.Brégeon D, Doddridge ZA, You HJ, Weiss B, Doetsch PW. Transcriptional mutagenesis induced by uracil and 8-oxoguanine in Escherichia coli. Mol Cell. 2003;12:959–970. doi: 10.1016/s1097-2765(03)00360-5. [DOI] [PubMed] [Google Scholar]

[r17] 17.Pastoriza-Gallego M, Armier J, Sarasin A. Transcription through 8-oxoguanine in DNA repair-proficient and Csb(-)/Ogg1(-) DNA repair-deficient mouse embryonic fibroblasts is dependent upon promoter strength and sequence context. Mutagenesis. 2007;22:343–351. doi: 10.1093/mutage/gem024. [DOI] [PubMed] [Google Scholar]

[r18] 18.Saxowsky TT, Meadows KL, Klungland A, Doetsch PW. 8-Oxoguanine-mediated transcriptional mutagenesis causes Ras activation in mammalian cells. Proc Natl Acad Sci USA. 2008;105:18877–18882. doi: 10.1073/pnas.0806464105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19] 19.Burns JA, Dreij K, Cartularo L, Scicchitano DA. O6-methylguanine induces altered proteins at the level of transcription in human cells. Nucleic Acids Res. 2010;38:8178–8187. doi: 10.1093/nar/gkq706. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] 20.Gordon AJ, Satory D, Halliday JA, Herman C. Heritable change caused by transient transcription errors. PLoS Genet. 2013;9:e1003595. doi: 10.1371/journal.pgen.1003595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21] 21.Gordon AJ, et al. Transcriptional infidelity promotes heritable phenotypic change in a bistable gene network. PLoS Biol. 2009;7:e44. doi: 10.1371/journal.pbio.1000044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r22] 22.Strathern JN, Jin DJ, Court DL, Kashlev M. Isolation and characterization of transcription fidelity mutants. Biochim Biophys Acta. 2012;1819:694–699. doi: 10.1016/j.bbagrm.2012.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r23] 23.Ji JP, Loeb LA. Fidelity of HIV-1 reverse transcriptase copying RNA in vitro. Biochemistry. 1992;31:954–958. doi: 10.1021/bi00119a002. [DOI] [PubMed] [Google Scholar]

[r24] 24.Minoche AE, Dohm JC, Himmelbauer H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 2011;12:R112. doi: 10.1186/gb-2011-12-11-r112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r25] 25.Gout JF, Thomas WK, Smith Z, Okamoto K, Lynch M. Large-scale detection of in vivo transcription errors. Proc Natl Acad Sci USA. 2013;110:18584–18589. doi: 10.1073/pnas.1309843110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r26] 26.Imashimizu M, Oshima T, Lubkowska L, Kashlev M. Direct assessment of transcription fidelity by high-resolution RNA sequencing. Nucleic Acids Res. 2013;41:9090–9104. doi: 10.1093/nar/gkt698. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r27] 27.Jabara CB, Jones CD, Roach J, Anderson JA, Swanstrom R. Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc Natl Acad Sci USA. 2011;108:20166–20171. doi: 10.1073/pnas.1110064108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28] 28.Acevedo A, Brodsky L, Andino R. Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature. 2014;505:686–690. doi: 10.1038/nature12861. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r29] 29.Acevedo A, Andino R. Library preparation for highly accurate population sequencing of RNA viruses. Nat Protoc. 2014;9:1760–1769. doi: 10.1038/nprot.2014.118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r30] 30.Fox EJ, Reid-Bayliss KS, Emond MJ, Loeb LA. Accuracy of next generation sequencing platforms. Next Gener Seq Appl. 2014;1:106. doi: 10.4172/jngsa.1000106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r31] 31.Walmacq C, et al. Rpb9 subunit controls transcription fidelity by delaying NTP sequestration in RNA polymerase II. J Biol Chem. 2009;284:19601–19612. doi: 10.1074/jbc.M109.006908. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r32] 32.Kasai H, Tanooka H, Nishimura S. Formation of 8-hydroxyguanine residues in DNA by X-irradiation. Gan. 1984;75:1037–1039. [PubMed] [Google Scholar]

[r33] 33.Lee DH, Pfeifer GP. Translesion synthesis of 7,8-dihydro-8-oxo-2'-deoxyguanosine by DNA polymerase eta in vivo. Mutat Res. 2008;641:19–26. doi: 10.1016/j.mrfmmm.2008.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r34] 34.De Bont R, van Larebeke N. Endogenous DNA damage in humans: A review of quantitative data. Mutagenesis. 2004;19:169–185. doi: 10.1093/mutage/geh025. [DOI] [PubMed] [Google Scholar]

[r35] 35.Cooke MS, et al. Oxidative DNA damage: Mechanisms, mutation, and disease. FASEB J. 2003;17:1195–1214. doi: 10.1096/fj.02-0752rev. [DOI] [PubMed] [Google Scholar]

[r36] 36.Lindahl T. DNA glycosylases, endonucleases for apurinic/apyrimidinic sites, and base excision-repair. Prog Nucleic Acid Res Mol Biol. 1979;22:135–192. doi: 10.1016/s0079-6603(08)60800-4. [DOI] [PubMed] [Google Scholar]

[r37] 37.Orgel LE. The maintenance of the accuracy of protein synthesis and its relevance to ageing. Proc Natl Acad Sci USA. 1963;49:517–521. doi: 10.1073/pnas.49.4.517. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r38] 38.Orgel LE. The maintenance of the accuracy of protein synthesis and its relevance to ageing: A correction. Proc Natl Acad Sci USA. 1970;67:1476. doi: 10.1073/pnas.67.3.1476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r39] 39.Martin GM, Bressler SL. Transcriptional infidelity in aging cells and its relevance for the Orgel hypothesis. Neurobiol Aging. 2000;21:897–900; discussion 903–904. doi: 10.1016/s0197-4580(00)00193-7. [DOI] [PubMed] [Google Scholar]

[r40] 40.Zhou S, et al. Primer ID validates template sampling depth and greatly reduces the error rate of next-generation sequencing of HIV-1 genomic RNA populations. J Virol. 2015;89:8540–8555. doi: 10.1128/JVI.00522-15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r41] 41.Traverse CC, Ochman H. Conserved rates and patterns of transcription errors across bacterial growth states and lifestyles. Proc Natl Acad Sci USA. 2016;113:3311–3316. doi: 10.1073/pnas.1525329113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r42] 42.Mamanova L, et al. Target-enrichment strategies for next-generation sequencing. Nat Methods. 2010;7:111–118. doi: 10.1038/nmeth.1419. [DOI] [PubMed] [Google Scholar]

[r43] 43.Schmitt MW, et al. Sequencing small genomic targets with high efficiency and extreme accuracy. Nat Methods. 2015;12:423–425. doi: 10.1038/nmeth.3351. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r44] 44.Frohman MA, Dush MK, Martin GR. Rapid production of full-length cDNAs from rare transcripts: Amplification using a single gene-specific oligonucleotide primer. Proc Natl Acad Sci USA. 1988;85:8998–9002. doi: 10.1073/pnas.85.23.8998. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r45] 45.Brégeon D, Doetsch PW. Transcriptional mutagenesis: Causes and involvement in tumor development. Nat Rev Cancer. 2011;11:218–227. doi: 10.1038/nrc3006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r46] 46.Korencic D, Soll D, Ambrogelly A. A one-step method for in vitro production of tRNA transcripts. Nucleic Acids Res. 2002;30:e105. doi: 10.1093/nar/gnf104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r47] 47.McBride TJ, Preston BD, Loeb LA. Mutagenic spectrum resulting from DNA damage by oxygen radicals. Biochemistry. 1991;30:207–213. doi: 10.1021/bi00215a030. [DOI] [PubMed] [Google Scholar]

[r48] 48.Klassen R, et al. A modified DNA isolation protocol for obtaining pure RT-PCR grade RNA. Biotechnol Lett. 2008;30:1041–1044. doi: 10.1007/s10529-008-9648-y. [DOI] [PubMed] [Google Scholar]

[r49] 49.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations

Kate S Reid-Bayliss

Lawrence A Loeb

Significance

Abstract

Results

Development of a Highly Accurate Method to Detect Epimutations.

Fig. 1.

ARC-Seq Effectively Corrects Reverse Transcription, PCR, and Sequencing Artifacts.

Fig. S1.

ARC-Seq Reveals the Frequency and Spectrum of Epimutations in Vivo.

Fig. 2.

Table S1.

Table S2.

Oxidative Stress Induces TM in Vivo.

Fig. 3.

Table S3.

Table S4.

Discussion

Conclusions

Methods

RNA Library Preparation.

Data Processing.

SI Materials and Methods

ARC-Seq Protocol for RNA Library Preparation.

Data Processing.

Overview of ARC-Seq Data Processing.

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations

Kate S Reid-Bayliss

Lawrence A Loeb

Significance

Abstract

Results

Development of a Highly Accurate Method to Detect Epimutations.

Fig. 1.

ARC-Seq Effectively Corrects Reverse Transcription, PCR, and Sequencing Artifacts.

Fig. S1.

ARC-Seq Reveals the Frequency and Spectrum of Epimutations in Vivo.

Fig. 2.

Table S1.

Table S2.

Oxidative Stress Induces TM in Vivo.

Fig. 3.

Table S3.

Table S4.

Discussion

Conclusions

Methods

RNA Library Preparation.

Data Processing.

SI Materials and Methods

ARC-Seq Protocol for RNA Library Preparation.

Data Processing.

Overview of ARC-Seq Data Processing.

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases