Significance
Accurate transcription is required for the faithful expression of genetic information. To identify molecular mechanisms that control the fidelity of transcription, we monitored transcriptional mutagenesis in human embryonic stem cells. These measurements provide the first rigorous estimate of the fidelity of transcription in human cells and identify multiple genetic and epigenetic factors that modulate the error rate. In addition, we developed a new reporter mouse that suggests that neurons, including hippocampal neurons, are highly sensitive to transcriptional mutagenesis, lending new support to the hypothesis that transcription errors play a role in various neurological disorders, including Alzheimer’s disease. These experiments provide unprecedented insights into the fidelity of gene expression and the molecular mechanisms that underpin the central dogma of life.
Keywords: transcription, transcription errors, mutagenesis, Alzheimer's disease, human embryonic stem cells
Abstract
To determine the error rate of transcription in human cells, we analyzed the transcriptome of H1 human embryonic stem cells with a circle-sequencing approach that allows for high-fidelity sequencing of the transcriptome. These experiments identified approximately 100,000 errors distributed over every major RNA species in human cells. Our results indicate that different RNA species display different error rates, suggesting that human cells prioritize the fidelity of some RNAs over others. Cross-referencing the errors that we detected with various genetic and epigenetic features of the human genome revealed that the in vivo error rate in human cells changes along the length of a transcript and is further modified by genetic context, repetitive elements, epigenetic markers, and the speed of transcription. Our experiments further suggest that BRCA1, a DNA repair protein implicated in breast cancer, has a previously unknown role in the suppression of transcription errors. Finally, we analyzed the distribution of transcription errors in multiple tissues of a new mouse model and found that they occur preferentially in neurons, compared to other cell types. These observations lend additional weight to the idea that transcription errors play a key role in the progression of various neurological disorders, including Alzheimer’s disease.
The human genome provides a precise, biological blueprint of life. To implement this blueprint correctly, it is essential that our genome is transcribed with the utmost precision. Sporadic errors are unavoidable though, and these errors reveal how important transcriptional fidelity is for cellular health. For example, in patients with nonfamilial cases of Alzheimer’s disease, transcription errors generate toxic APP and UBB peptides that are part of the neuropathological hallmarks that characterize the disease (1, 2), suggesting that they contribute to disease progression. The toxic UBB peptide (UBB+1) is also a potent inhibitor of the proteasome ubiquitin complex (3) and can be found in protein aggregates with other tauopathies and polyglutamine repeat disorders (1, 4–6), suggesting that transcription errors contribute to other protein misfolding diseases as well. Intriguingly, errors in the UBB and APP genes occur repeatedly at the same location, creating many copies of the same mutant protein. Random transcription errors (errors that occur only once at each position) contribute to protein misfolding diseases as well. While studying yeast cells that display error-prone transcription, we found that these errors tend to compromise the structural integrity of proteins and induce protein misfolding at a global scale (7). Although most of these misfolded proteins are benign, their sheer volume can overwhelm the protein quality control machinery and prevent the degradation of toxic proteins that are normally targets for this machinery, including Aβ1-42 (AD), TDP-43 (amyotrophic lateral sclerosis and frontotemporal dementia), HTT103Q (Huntington’s disease), and RNQ1 (7) (a prion).
Taken together, these observations suggest that transcription errors can affect proteotoxic diseases by two complementary mechanisms: i) They can generate specific proteins that are associated with the disease and ii) they can create the environment that allows these proteins to persist and seed aggregates. Transcription errors affect other cellular processes as well though. For example, they can induce oncogenic pathways in human cells (8), limit the lifespan of yeast (7), compromise the metabolism of nicotinamide adenine dinucleotide (NAD), nucleotides, and amino acids (9), and change the fate of bacterial cells (10).
However, despite our increased understanding of the impact of transcription errors on cellular health, relatively little is known about the molecular mechanisms that control the fidelity of transcription, especially in human cells. To address this issue, we measured the error rate of transcription in H1 human embryonic stem cells (H1 hESCs). We chose these cells because they were extensively characterized by the human ENCODE project (11) and their molecular details are known in great detail. Accordingly, we were able to cross-reference the errors that we detected with the ENCODE dataset, so that the role of histone markers, DNA binding proteins, and methylated bases in transcriptional mutagenesis could be investigated. In addition, we developed a new reporter mouse that revealed that neurons are prone to transcriptional mutagenesis. Interestingly, neurons implicated in Alzheimer’s disease and other proteotoxic disorders seem to be especially prone to committing errors. These observations provide new support for the hypothesis that transcription errors play a role in the etiology of this disease (1, 2).
Results
Data Overview.
To determine how accurately the human genome is transcribed, we sequenced the transcriptome of H1 hESCs with an optimized version (9) of the “circle-sequencing assay” (12, 13) (Fig. 1A). We then compared their transcriptome to a custom-made reference genome (300× coverage) to identify transcription errors (Fig. 1 B and C and SI Appendix, Fig. S1). In addition, we used these datasets to demonstrate that all replicates had a similar transcriptomic profile, indicating that their health and overall status were comparable (SI Appendix, Fig. S2). This analysis yielded 101,884 transcription errors, distributed over every major RNA species in human cells (Fig. 1C), including messenger RNA (mRNA), ribosomal RNA (rRNA), transfer RNA, small nuclear RNA, small nucleolar RNA, ribozymes, and various classes of pseudogenes. Because these transcripts are synthesized by different RNA polymerases (RNAPs), this dataset allowed us to determine the accuracy of every major nuclear RNAP in human cells (Figs. 1 C–F and 2A). In addition, we detected a substantial number of transcripts derived from the mitochondrial genome, which allowed us to determine the accuracy of the mitochondrial RNAP as well (Figs. 1 C–F and 2A). Finally, we combined these errors into a single file that can be downloaded from the SRA archive (14).
Fig. 1.
Overview of dataset. A. Concept of the circle-sequencing assay. Traditional RNA-seq assays introduce reverse transcription errors (blue circles) during the generation of cDNA libraries (orange lines, the RNA template is in blue) that are indistinguishable from true transcription errors (red circles). In addition, sequencing mistakes (yellow circles) arise during massively parallel sequencing reactions that can be misinterpreted for transcription errors. To circumvent these pitfalls, RNA molecules are ligated to themselves and circularized prior to reverse transcription. These circular molecules can then be reverse transcribed in a rolling circle fashion to generate linear cDNA molecules that consist of multiple concatenated copies of the original RNA template. If a transcription error was present in the template, this error will be present in every concatenated copy, while reverse transcription errors or sequencing mistakes will only be present in one or two copies so that true transcription errors to be distinguished from artifacts. B. The median coverage per base of each chromosome by DNA-sequencing to generate a reference genome. C. The distribution of RNA bases sequenced and the errors detected in transcripts that were synthesized by different RNAPs. D–F. Transcription errors detected in mRNAs (D), rRNA (E), and mitochondrial RNA (F). Genes and chromosomes are located on the outer ring, followed by the coverage of these sequences, base substitutions per 10 kb or 100 bps in red, insertions per 10 kb or 100 bps in blue, deletions per 10 kb or 100 bps, and the error rate/bp in yellow over these intervals. The inner ring represents the error rate.
Fig. 2.
Genetic determinants of the error rate of transcription. A–D. Different RNAPs display different error rates (A) and spectra (B–D). E. Different transcripts generated by RNAPII display different error rates. The numbers above the bars indicate the number of errors detected in each class of RNAs. F. The error rate of transcription decreases near the stop codon. G. Most single-base substitutions decrease near the stop codon (interval 25 to 32 in F). H. Insertions increase near the stop codon. Please note that the insertion and base substitution error rate have been normalized to each other, to better display the qualitative differences between them. In addition, the intervals have been condensed by a factor of 2 compared to F. For example, interval 10 equals interval 20 in F. I. The error rate of transcription is highest if the next base that needs to be inserted is a pyrimidine.
Human Cells Prioritize the Fidelity of Specific Transcripts.
To determine the parameters that control the error rate of transcription, we first compared the accuracy of different RNAPs to each other. This analysis demonstrated that RNAPI (4.1 × 10−6/bp) and II (4.2 × 10−6/bp) make the least mistakes, followed by the mitochondrial RNAP (mtRNAP, 1.0 × 10−5/bp) and RNAPIII (1.3 × 10−5/bp). Thus, human cells prioritize the fidelity of RNAPI and II over other polymerases. We previously observed a similar trend in other organisms, indicating that this feature may be evolutionarily conserved (9). The increased error rates of RNAPIII and mtRNAP are fueled by the unique error spectra of these polymerase. For example, RNAPIII makes five times more G→A errors than RNAPI or II, which drives their error-prone phenotype (Fig. 2 B–D). It is important to note though that despite the relatively high fidelity of RNAPI and II, the error rate of transcription is still >100-fold higher than the mutation rate (15), underscoring the vast potential of transcription errors to create mutant proteins.
Because our data suggest that human cells prioritize the fidelity of some transcripts over others, we wondered how expansive this prioritization process is. To explore this issue, we examined how accurate RNAPII is when it transcribes different classes of genes. This analysis revealed that RNAPII makes fewer mistakes when it transcribes genes that encode micro RNAs and proteins compared to pseudogenes, ribozymes, and lncRNAs (Fig. 2E). To gain more insights into the molecular mechanisms that might be responsible for this observation, we analyzed the error spectrum of RNAPII on ribozymes and lncRNAs, the two templates with the highest error rates. Interestingly, we found that transcripts that encode ribozymes display a sharp increase in the rate of insertions, deletions, and G→A errors, a signature that is highly reminiscent of RNAs transcribed by RNAPII without its fidelity factors (9) (SI Appendix, Fig. S4 A and B). Thus, one potential mechanism by which RNAPII could alternate the fidelity of transcription is the inclusion or exclusion of fidelity factors.
A different mechanism seems to be responsible for the increased error rate of lncRNAs because in contrast to ribozymes, these RNAs display an increased rate for every possible transcription error (SI Appendix, Fig. S4 C and D). Intriguingly, we made the inverse observation when we examined the fidelity of transcription in protein-coding genes. Surprisingly, we found that the error rate of transcription is not equal across the length of these genes. Instead, nearly every error decreases in frequency as RNAPII approaches the stop codon (Fig. 2 F and G). One potential mechanism to alter the error rate of many, or all possible errors, would be to alternate the speed of RNA synthesis. Like DNA polymerases (16), the speed and accuracy of RNAPs are inversely correlated with each other (17, 18), so that the faster an RNAP works, the less accurate it is. It is important to note though, that in contrast to base substitutions, insertions and deletions do increase in frequency near the stop codon (Fig. 2H and SI Appendix, Fig. S5). Most likely, this pattern is caused by a reduction in the efficiency of nonsense-mediated mRNA decay (NMD). An important distinction between indels and base substitutions is that indels frequently result in premature stop codons that trigger NMD (19). However, it is increasingly difficult for a cell to distinguish between a premature stop codon and a native stop codon if these codons are located close to each other. Accordingly, the increased number of insertions and deletions near the native stop codon is most likely due to the reduced ability of NMD to detect these events. We previously confirmed this idea in yeast cells (9).
The Primary Sequence of the Human Genome Alters the Fidelity of Transcription.
Next, we investigated whether the fidelity of transcription is affected by the primary sequence of the human genome. To answer this question, we examined the error rate of transcription in different genetic contexts. First, we computed the error rate of RNAPII on all 4 nucleotides and then tested whether these error rates alter as a result of their 5′ and 3′ neighbors. This analysis indicated that this is indeed the case, with the 3′ base being the most important determinant for the fidelity of transcription (Fig. 2I). If the next base that needs to be inserted is a pyrimidine, the error rate tends to be lower than when it is a purine. Because the incorporation of a pyrimidine is slightly slower than that of a purine, it is possible that the speed of incorporation plays a role in this phenomenon as well (20). A slower incorporation rate would provide RNAPII with more time to remove a misincorporated base. Regardless, we observed a similar trend in yeast cells, indicating that this molecular phenomenon is evolutionarily conserved (9). Finally, we created similar genetic context maps for rRNA and mtRNA (SI Appendix, Fig. S6).
We also examined the error rate of transcription on the tracts of mono- and di-nucleotide repeats. RNAPII tends to slip on these tracts in model organisms (9, 21, 22), and similar slippage events have been observed in humans. For example, in patients with Down syndrome and nonfamilial cases of Alzheimer’s disease, RNAPII was found to slip on two dinucleotide tracts in the APP and UBB gene, resulting in shortened peptides that are part of the neuropathological hallmarks (plaques and tangles) that characterize the disease (1, 2). However, it is unknown how frequently these slippage events occur in human cells, which obscures their impact on human aging and disease. To fill this gap in our knowledge, we compared the error rate of RNAPII on mono- and di-nucleotide repeats to the rest of the human genome. This analysis demonstrates that repeats raise the error rate of insertions and deletions up to 200-fold (Fig. 3 A–C), an increase that is proportional to the length of the tracts. In addition, we found that some repeats result in smaller but significant increases in base substitutions (Fig. 3 A–C). These observations suggest that toxic APP and UBB peptides are generated at an accelerated rate in human cells, which could promote the development of proteotoxic diseases.
Fig. 3.
RNAP II slips on mono- and di-nucleotide tracts. A. Increased rates of insertions and deletions along a mononucleotide tract of adenines. Slippage events increase on sequences that are made up of six or eight thymine bases in the DNA. B. Increased transcription errors on all possible mononucleotide tracts made up of eight bases. C. Increased transcription errors on dinucleotide tracts made up of eight bases. Please note that for panels B and C, the WT and mutant bases of all six replicates have been combined together because of the relatively low coverage of these repeat sequences across the transcriptome. D. Genetic construct inserted in C57Bl/6j mice that contains a mutated version of the Cre-gene. This gene has been placed out of frame and contains a slippery tract of 10 adenines in a row. Upon an insertion or deletion that places Cre back in-frame, a short burst of WT Cre-proteins is generated that can excise a Neo-cassette that interrupts an EYFP gene, leading to constitutive EYFP expression. E. A positive control where EYFP was activated in every intestinal cell by a Sox2-driven WT-Cre. F. EYFP expression is nearly absent in mice that express the Creout gene. G. EYFP+ cells are also sparse in the liver. H and I. Large numbers of EYFP+ cells in the CA1 but not CA2 region of the hippocampus (H) and the dentate gyrus (I, D.G.). An overview of the entire hippocampus is provided in figure J. The Purkinje cells in the cerebellum were also frequently EYFP+ (K, Purkinje cells) in Creout; EFYPneo mice.
Neurons Are Prone to Transcriptional Mutagenesis.
Because transcriptional slippage events are implicated in Alzheimer’s disease and other neuronal diseases characterized by protein misfolding, we decided to investigate which cell types commit these errors the most. To do so, we developed a new mouse model that can be used to identify cell types and tissues that have experienced a slippage event. This mouse model carries a Cre-recombinase gene that was placed out of frame by a 1-bp deletion inside a slippery tract of 10 adenines (Creout, Fig. 3D). Because of this frameshift, the Cre gene can only produce functional Cre proteins if RNAPII slips on the mononucleotide tract and places the Cre transcript back in the frame. The proteins made from this transcript can then form a biologically active tetramer and excise a neo-cassette that interrupts the open reading frame of EYFP (EYFP neo). After excision of this cassette, EYFP is permanently activated, so that every cell that has undergone a slippage event is permanently labeled by a constant and uninterrupted fluorescent signal. It is further important to note that translation errors cannot result in EYFP activation because they only generate a single Cre protein, which is insufficient to create a biologically active tetramer. Mutations could result in EYFP activation on the other hand, but because these events occur 100-fold less frequently than transcription errors, it is unlikely that they result in large amounts of EYFP+ cells.
We then aged Creout; EYFPneo mice for 1 y to allow EYFP+ cells to accumulate and stained intestine, liver, and brain tissue with EYFP-antibodies (Fig. 3 E–K). Interestingly, we found that liver and intestine contained few EYFP+ cells (Fig. 3 F and G). However, EYFP+ cells did accumulate in specific structures of the brain. Most notably, we detected many EFYP+ cells in the hippocampus, a region that is directly implicated in Alzheimer’s disease (23). Most EYFP+ cells were found in the dentate gyrus, the CA1 and CA3 neurons of the hippocampus, and the subiculum (Fig. 3 H–J), brain regions that are pivotal to memory formation and the pathology of Alzheimer’s disease. In contrast, neighboring CA2 cells, which are relatively spared in Alzheimer’s disease (24), were rarely EYFP-positive, indicating remarkable regional differences in the error rate of transcription between related neurons. Similarly, neighboring cells were rarely EYFP-positive as well, indicating large differences in the error rate between cell types within a single tissue. These results indicate that in addition to genes involved in Alzheimer’s disease being error prone, critical cell types in the hippocampus are error prone as well. Together, these phenomena could create a constant stream of toxic APP and UBB peptides that contribute to Alzheimer’s disease progression. In addition, we identified numerous EYFP+ Purkinje cells in the cerebellum (Fig. 3K). Importantly, Purkinje cells are highly sensitive to protein aggregation as well (25) and directly implicated by prion disease (26), suggesting that error-prone transcription could be a feature of multiple cell types that are sensitive to proteotoxic stress.
The Strength of Transcription Is Directly Coupled to Its Fidelity.
Next, we considered how epigenetic factors might affect the fidelity of transcription. To answer this question, we cross-referenced the errors that we detected with epigenetic modifications that were cataloged in H1 ESCs by the human ENCODE project (11) and the NIH Roadmap Epigenomics Mapping Consortium. This analysis revealed that several histone modifications associated with active transcription, including H3K4me1, H3K4me2, and H3K4me3 marks, correlate with increased rates of base substitutions (Fig. 4A), insertions, and deletions (SI Appendix, Fig. S7). To determine whether the opposite is true as well (i.e., that markers associated with gene repression correlate with reduced error rates), we quantified the relationship between transcription errors and two mutually exclusive modifications on lysine 9 of histone 3 (H3K9). At this location, histones can either be acetylated (H3K9ac), which is a mark associated with active transcription, or tri-methylated (H3K9me3), which is a mark associated with repressed transcription (27). We again found that the active H3K9ac mark was associated with an increased error rate, but in addition, we discovered that the repressive H3K9me3 mark was associated with a reduced error rate (Fig. 4A). Taken together, these observations suggest that the strength and fidelity of transcription are directly coupled to each other. One mechanism by which these parameters may be coupled to each other is the elongation speed of RNAPII: the faster an RNAP works, the less precise it is (17). To explore this hypothesis further, we examined the error rate near two histone marks that are associated with increased speed of elongation (H3K79me2 and H4K20me1 (28)). Consistent with a role for elongation speed in transcriptional fidelity, we found that regions carrying these marks are transcribed less accurately than regions that do not (Fig. 4A). Moreover, H3K79me2 and H4K20me1 marks tend to be lost near the 3’ end of genes (27), locations where elongation is known to slow down due to a pile-up of RNAPs that are in the process of transcription termination (29). Accordingly, these observations also support the idea that the increased fidelity of RNAPII around the stop codon is the result of reduced elongation rates.
Fig. 4.
Epigenetic determinants of the error rate of transcription. A. Various histone modifications associated with active gene expression and RNAP elongation correlate with increased error rates, while markers that are associated with gene repression correlate with lower error rates. B and C. Open sequences of DNA tend to display increased error rates. Please note that the ATAC-seq peaks displayed in panel B have been normalized to the error rate to better display their qualitative overlap. D. Coding sequences that exhibit methylation levels associated with strong transcription display increased error rates. E and F. Genes that display strong transcription display higher error rates than those with weak transcription. In A, C, E, and F, each datapoint represents one of the six replicates.
To further investigate the link between gene expression and fidelity, we examined how DNA accessibility affects the error rate of transcription. In support of the idea that increased expression compromises the fidelity of transcription, we found that ATAC-peaks (locations along the genome that are accessible to transposon integration) closely correlate with increased error rates along the length of a gene (Fig. 4B), except for the reduced fidelity seen in 3′ end of the coding region. The more accessible the DNA is, the higher the error rate is for base substitutions, insertions, and deletions (Fig. 4C). This association is especially strong for specific types of errors like G→A substitutions, although other errors like U→C substitutions are anticorrelated with ATAC-peaks (SI Appendix, Fig. S8). More work will be needed to dissect the molecular basis for these observations. Another epigenetic mechanism associated with transcriptional activity is DNA methylation. Increased DNA methylation in the promoter or 5′ untranslated region (UTR) of a gene is usually associated with transcriptional repression. However, DNA methylation in the gene body has a parabolic relationship with gene expression, so that very little or highly abundant methylation is associated with lower levels of transcription, while intermediate methylation is associated with active transcription (30). Further supporting the idea that the strength of transcription is directly coupled to its fidelity, we found that the highest error rates occurred in genes with intermediate methylation levels in the coding sequence (Fig. 4D). We then corroborated this idea with a transcriptome-wide analysis of the ENCODE project dataset that showed that areas of strong transcription are associated with increased error rates for base substitutions, insertions, and deletions (Fig. 4 F and G).
BRCA1 Controls the Fidelity of Transcription at R-Loops.
In addition to histones and epigenetic marks, gene bodies are also covered by various DNA-binding proteins. To determine whether these proteins can affect the error rate of transcription, we monitored how the presence or absence of CCCTC-binding factor (CTCF), c-Myc, Nanog, SIRT6, RAD21, and BRCA1 impact transcriptional mutagenesis. We chose CTCF because of its versatile functions in transcription, chromatin structure, and V(D)J recombination (some of these functions may affect transcriptional mutagenesis), c-Myc because it is implicated in carcinogenesis (transcription errors are implicated in carcinogenesis as well), Nanog because of its role in stem cell maintenance, SIRT6 because of its role in the aging process (transcription errors are implicated in aging as well), and RAD21 and BRCA1 because of their role in DNA repair [DNA damage is a source of transcription errors (8, 31–33)]. Although most of these proteins had little or no effect on transcriptional fidelity (SI Appendix, Fig. S9), we found that locations marked by BRCA1 displayed a significant increase in the error rate of transcription (Fig. 5A). BRCA1 is best known as a DNA repair protein that plays a role in the suppression of breast cancer (34). However, BRCA1 also binds to R-loops (35), triple-stranded structures (36) that are highly sensitive to DNA damage (37). BRCA1 prevents these damaged structures from spreading across the genome and assists with local DNA repair processes if needed. Because DNA damage is a powerful source of transcription errors (38), we hypothesized that damaged R-loops could be responsible for the increased error rates detected at BRCA1 peaks. To explore this hypothesis, we carefully examined the error spectrum and found that at BRCA1 peaks, RNAPII displays a fivefold increase in G→A errors and a 16-fold increase in C→A errors (Fig. 5B). These errors are the two most common mistakes made by RNAPII on oxidatively damaged bases (39). The most common consequence of oxidative damage is cytosine deamination, which creates uracil and uracil glycol lesions that mispair with adenine during transcription to induce G→A errors (31, 32). Similarly, oxidative damage causes 8-oxo-guanine lesions that mispair with adenine to induce C→A errors (8). It was previously shown that by preventing damage-prone R-loops from spreading across the genome, BRCA1 limits mutagenesis in these structures. Our data now suggest that this function of BRCA1 also accomplishes a secondary goal, i.e., improving the fidelity of transcription.
Fig. 5.
BRCA1 is associated with altered error rates and RNA editing. A and B. Genetic locations where BRCA1 is known to bind display elevated error rates and altered spectra. C. A→G “errors” increase in the 5′ and 3′ UTRs of genes, where A→I editing is known to occur. D. The genetic context of errors that replace adenine changes in the 3′ UTR toward the genetic context preferred by ADAR1. E. Expression of the human ADAR1 gene results in an elevated A→G error rate.
Circle-Sequencing Allows for Ultrasensitive Detection of Off-Target RNA Editing.
In addition to G→A and C→A errors, we observed an increase in A→G errors at BRCA1 peaks (Fig. 5B). Interestingly, it was previously shown that R-loops at telomeres can be edited by ADAR1 (39), leading to A→I edits (40), and a large body of evidence now supports the idea that these RNA editing enzymes tend to exhibit off-target effects. Because inosine pairs with cytosine, such off-target editing events on genomic R-loops could be reported as A→G errors by our circ-seq assay. In addition, it should be noted that A→I editing events are also common in the 3′ UTR of genes (41). To test the idea that off-target A→I editing events are reported as A→G errors by our assay, we analyzed the location of A→G errors along gene bodies. We found that although this error is relatively rare in coding regions, it becomes increasingly prevalent in the 3′ UTR of human transcripts (Fig. 5C). No other error displays a similar distribution (SI Appendix, Fig. S10). In addition, we found that the genetic context of the errors that we detected on adenine bases in the 3′ UTR shifts toward the genetic context that is preferred by ADAR1, which prefers an adenine or uracil as its 5′ nearest neighbor and a guanine on its 3′ side (42) (Fig. 5D). Taken together, these observations suggest that circle-sequencing is detecting editing events that are performed by ADAR enzymes. To explore this hypothesis further, we expressed the human ADAR1 gene in the budding yeast Saccharomyces cerevisiae (an ADAR1 naïve system) and detected a clear increase in A→G errors (Fig. 5E). This observation unambiguously demonstrates that circle sequencing can detect RNA editing events across the transcriptome, lending credence to the idea that the A→G errors detected at BRCA1 peaks are indeed off-target editing events.
The safeguards present in our bioinformatic pipeline to prevent editing events from being reported as transcription errors ensure that any site that is edited >0.5% is automatically discarded. To be included in our analysis, each base pair must be sequenced >200 times, and if an edit occurs in more than 0.5% of these reads, it is automatically discarded. It is probably most prudent then to call these errors transcript errors, as opposed to transcription errors. Intriguingly though, editing errors are now recognized as a new type of “biological error” that is receiving increasing attention due to its potential physiological consequences on protein aggregation and the generation of unique, immunogenic epitopes (43). Importantly, transcription errors have similar effects on cells. Our observations now suggest that circle-sequencing could provide a new tool to catalog and analyze off-target editing across the transcriptome, so that their impact on human biology can be evaluated. Such experiments are especially important in light of the genetically engineered RNA editors that are currently being tested for therapeutic purposes in humans. Circle-sequencing could provide an essential quality-control mechanism to ensure that these editors result in as few off-target effects as possible.
Discussion
The accuracy of DNA replication, transcription, and translation form the foundation of life itself. Together, these processes ensure the faithful inheritance and expression of our genetic code. It is essential, therefore, that we understand the molecular mechanisms that control the fidelity of these processes. Surprisingly though, relatively little is known about the fidelity of transcription, leaving a key component of life largely unexplored. To fill this gap in our knowledge, we measured the fidelity of transcription in hESCs. These cells have been extensively characterized by the human ENCODE project, allowing us to correlate the errors that we detected with various static and dynamic features of the genome. In the future, it will be useful to perform similar experiments to investigate whether similar features modulate transcriptional fidelity in differentiated cell types, including neurons and cardiomyocytes. In human stem cells though, we found that the error rate of transcription is remarkably variable across the transcriptome. The error rate not only differs between polymerases but also between classes of genes and even with specific regions of these genes. One surprising conclusion then is that human cells seem to prioritize the fidelity of some transcripts over others.
Our analyses suggest that they may do so by two separate mechanisms: by assembling RNAPII with or without fidelity factors, or by altering the speed of RNA synthesis. First, we found that the error spectrum of ribozyme RNAs closely mimics the error spectrum of Saccharomyces cerevisiae cells lacking the transcription elongation factor IIS (TFIIS) and Caenorhabditis elegans worms lacking the TFIIS homolog T24H10.1. Human cells carry three homologs of TFIIS and T24H10.1, labeled TCEA1, 2, and 3. Based on our data, we predict that these factors are not part of the RNAPII holoenzyme that transcribes ribozymes. In addition, we found that lncRNAs display an increased error rate for all possible base substitutions, insertions, and deletions, suggesting that the inherent fidelity of RNAPII itself has changed. Our observations suggest that genes might experience altered error rates as a consequence of changes in the speed of RNAPII elongation.
This phenomenon seems to be part of a broader narrative that suggests that the strength of transcription is directly linked to fidelity. Intriguingly, genes that are rarely transcribed tend to display lower error rates than genes that are highly transcribed. In principle, the improved fidelity of rarely transcribed genes could have an advantage for human cells, in that if few transcripts are made of a gene, an error in one of those transcripts could have an outsized impact on its function. However, it remains unclear if the power of natural selection is adequate to fine-tune transcription error rates on a gene-by-gene basis (15).
Our data also suggest that BRCA1 may have an additional, uncharacterized function in controlling the fidelity of transcription. BRCA1 is known to bind to R-loops (35), triple-stranded structures that are prone to the accumulation of DNA damage (36). BRCA1 prevents these R-loops from spreading (35), thereby limiting DNA damage from accumulating across the genome. In addition, BRCA1 is a DNA repair protein and can thus assist in the repair of these lesions to prevent mutations from arising during DNA replication. It now seems that these functions may have a secondary purpose as well: to improve the fidelity of transcription. DNA damage is a powerful source of transcription errors. In addition to alkylated bases, such as those caused by MNNG, other lesions can induce transcription errors as well, including 8-oxo-guanine, uracil glycol, uracil, abasic sites, 5-hydroxycytosine, and single-strand breaks. Thus, by curtailing R-loops and the spread of DNA damage across the genome, BRCA1 suppresses transcriptional mutagenesis as well. In this context, it is important to note that DNA damage-induced transcription errors were previously shown to activate oncogenic pathways in human cells (8). Accordingly, our observations raise the possibility that one additional mechanism by which BRCA1 suppresses human cancers is the prevention of transcriptional mutagenesis.
Another consequence of transcription errors is protein aggregation (1, 7, 9). It was previously shown that slippage events on dinucleotide tracts in the APP and UBB genes result in toxic peptides that are part of the neuropathological hallmarks that characterize Alzheimer’s disease. In addition, it was shown that the UBB+1 peptide is a potent inhibitor of the ubiquitin-proteasome system (3) and can be found in protein aggregates from other tauopathies and polyglutamine repeat disorders (1, 4–6), suggesting that errors in the UBB transcript could contribute to other protein misfolding diseases as well. Here, we build on these findings by demonstrating that mono- and di-nucleotide tracts, like those present in the APP and UBB transcripts, are especially dangerous because they display greatly increased error rates. In addition, we found that these types of slippage events occur most frequently in cells that are directly implicated in the progression of diseases characterized by protein misfolding, including Alzheimer’s and prion disease. Together, these observations suggest that these cells experience a constant, larger than a normal stream of toxic peptides that promote disease progression by compromising global proteostasis and generating highly specific toxic peptides that are directly implicated in disease. The exact impact of transcription errors on protein-misfolding diseases though is likely dictated by several factors, including the transcripts that are affected, the cell types that make them, and the error rate of these cells. Because these factors are partially governed by chance, our observations raise the possibility that transcription errors could play a role in the heterogeneity and multifactorial etiology of Alzheimer’s disease and several other protein-misfolding diseases.
Methods
H1 hESC Culture.
H1 hESCs were purchased from WiCell in Wisconsin (WA01) and cultured in TeSR medium in Matrigel-coated 10-cm plates. Cells were grown at 5% O2 tension to better mimic the conditions inside the human body and reduce oxidative damage as a result of normoxic conditions. To passage the cells, and just prior to collection of RNA and DNA, cells were gently treated with 2 µg/mL Dispase mixed with Dulbecco's Modified Eagle Medium/F12, washed with phosphate buffered saline (PBS), and scraped off the plate using a glass pipette. DNA and RNA were then isolated with standard phenol chloroform and Trizol methods.
Library Construction and Sequencing.
Library preparation: 1,100 ng of enriched mRNA was fragmented with the NEBNext RNase III RNA Fragmentation Module (E6146S) for 25 min at 37 °C. RNA fragments were then purified with an Oligo Clean & Concentrator kit (D4061) by Zymo Research according to the manufacturer’s recommendations, except that the columns were washed twice instead of once. The fragmented RNA was then circularized with RNA ligase 1 in 20 µL reactions (NEB, M0204S) for 2 h at 25 °C after which the circularized RNA was purified with the Oligo Clean & Concentrator kit (D4061) by Zymo Research. The circular RNA templates were then reverse transcribed in a rolling-circle reaction by first incubating the RNA with for 10 min at 25 °C to allow the random hexamers used for priming to bind to the templates. Then, the reaction was shifted to 42 °C for 20 min to allow for primer extension and cDNA synthesis. Second strand synthesis and the remaining steps for library preparation were then performed with the NEBNext Ultra RNA Library Prep Kit for Illumina (E7530L) and the NEBNext Multiplex Oligos for Illumina (E7335S, E7500S) according to the manufacturer’s protocols. Briefly, cDNA templates were purified with the Oligo Clean & Concentrator kit (D4061) by Zymo Research and incubated with the second strand synthesis kit from NEB (E6111S). Double-stranded DNA was then entered into the end-repair module of RNA Library Prep Kit for Illumina from NEB, and size selected for 500 to 700 bp inserts using AMPure XP beads. These molecules were then amplified with Q5 PCR enzyme using 11 cycles of PCR, using a two-step protocol with 65 °C primer annealing and extension and 95 °C melting steps. Sequencing data were converted to industry standard Fastq files using BCL2FASTQv1.8.4.
Error Identification.
We have developed a robust bioinformatics pipeline to analyze circ-seq datasets and identify transcription errors with high sensitivity (9, 44). First, tandem repeats are identified within each read (minimum repeat size: 30 nt, minimum identity between repeats: 90%), and a consensus sequence of the repeat unit is built. Next, the position that corresponds to the 5′ end of the RNA template is identified (the reverse transcription reaction is randomly primed, so cDNA copies can “start” anywhere on the template) by searching for the longest continuous mapping region. The consensus sequence is then reorganized to start from the 5′ end of the original RNA fragment, mapped against the genome with tophat (version 2.1.0 with bowtie 2.1.0), and all nonperfect hits go through a refining algorithm to search for the location of the 5′ end before being mapped again. Finally, every mapped nucleotide is inspected and must pass five checks to be retained: 1) It must be part of at least three repeats generated from the original RNA template; 2) all repeats must make the same base call; 3) the sum of all qualities scores of this base must be >100; 4) it must be >2 nucleotides away from both ends of the consensus sequence; and 5) each base must be covered by ≥200 reads with <1% of these reads supporting a base call different from the reference genome. This final step filters out polymorphic sites and potential RNA-editing events. For example, if a base call is different from the reference genome, but is present in 100 out of 200 reads, it is not labeled as an error but as a heterozygous mutation. A similar rationale applies to low-level mutations and RNA editing events. Each read containing ≥1 mismatch is filtered through a second refining and mapping algorithm to ensure that errors in calling the position of the 5′ end cannot contribute to false positives. The error rate is then calculated as the number of mismatches divided by the total number of bases that passed all quality thresholds.
Bioinformatic Analysis.
To determine the location of transcription errors along the length of genes, we first mapped each error to a distinct coordinate along the length of the human genome, using the 38th assembly of the human genome as our guide (GRCh38). We then generated the circle plots in Fig. 1 with the software “circa,” which was downloaded from https://omgenomics.com/circa/. Then, we cross-referenced these coordinates with the gene and transcript annotations provided by the 100th version of Ensembl. Once we knew the transcripts that were affected by each error, we determined the exact position of each error within these transcripts with the GenomicFeatures package in R. Finally, we separated the transcripts into their 5′UTR, coding region, and 3′UTR and divided these regions into bins. We then used the number of errors that were assigned to each bin to compute the error rate along the length of the entire transcript.
To determine the role of histone modifications and transcript factors on the error rate of transcription, we first downloaded the relevant histone modification and transcript factor Chromatin Immuno-precipitation sequencing data from the ENCODE (https://www.encodeproject.org), NIH Roadmap Epigenomics Mapping Consortium (http://www.roadmapepigenomics.org/data/), and Cistrome databases (http://cistrome.org/). When multiple datasets were available for the same parameter, we compared them to each other and identified overlapping peaks that were shared between datasets. We then used these overlapping peak calls to separate transcription events as within or outside of the peak regions. We then calculated the error rates as above for both peak and nonpeak regions.
To determine the impact of DNA methylation on the error rate of transcription, we downloaded the processed, whole-genome bisulfite sequencing dataset in H1 hESCs from https://egg2.wustl.edu/roadmap/web_portal/processed_data.html#MethylData. Because this dataset was mapped onto a previous version of the human genome assembly (hg19), we first lifted it over to the current assembly (hg38). Then, we correlated the frequency of methylation at CpG sites with the error rate of transcription. To do so, we divided the methylation frequency of CpG sites into 11 bins, where bin 1 contains all sites that display no methylation and bins 2 to 10 contain CpG sites that are methylated in 10% increments, so that bin 2 = 0 to 10% methylation, bin 3 10 to 20% methylation, and so on. We then calculated the error rates for each of these bins as described above.
To determine how strong or weak transcription affects the error rate, we downloaded the core 15-state model data from https://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state. We then assigned the transcription errors and the number of times each base was covered to each core state and calculated the error frequency as above.
Construction of Reporter Mice.
The Cre A10G out-of-frame construct was synthesized by molecular cloning and genetic engineering and introduced into C57BL/6 embryonic stem (ES) cells by electroporation. Cre A10G cells were then selected for by neomycin treatment and injected into a white derivative of C57BL/6 mice. Pups showing evidence of incorporating the ES cells, as determined by a mosaic black and white coat pattern, were then bred to C57BL/6, and germline transmission of the Cre A10G construct was identified by PCR. These mice were subsequently crossed to an EYFP reporter mouse that was purchased from JAX (strain #006148).
Staining of Mouse Tissues.
Mice were kept on a 12-h light/dark cycle with food and water available ad libitum. Only male mice were used for experiments. Experimental mice were euthanized according to IACUC-approved guidelines by CO2 asphyxiation and cervical dislocation, and tissues were immediately collected and fixed in 10% neutral buffered formalin (15740-01, EMS) for 24 h at RT. Formalin-fixed tissues were rinsed with PBS and cryoprotected with 30% sucrose in PBS at 4 °C. Next, tissues were embedded with Tissue-Tek Optimal Cutting Temperature (OCT) compound (4583, Sakura Finetechnical) and rapidly frozen in chilled isopentane/2-methylbutane. Frozen OCT-embedded tissues were sectioned on a cryostat (Leica CM1860, Leica Biosystems) into 10-μm thick sections and collected on Superfrost Plus microscope slides (12-550-15, Fisher Scientific). For immunofluorescence, sections were incubated with rabbit anti-GFP primary antibodies (1:500, 600-401-215, Rockland, USA) overnight at 4 °C. Next, sections were rinsed with PBS and incubated with Alexa Fluor 488 goat anti-rabbit IgG secondary antibodies (1:400, A11034, Invitrogen) for 1 h at RT. Sections were subsequently rinsed with PBS and mounted with Vectashield mounting media with DAPI (H-1200, Vector laboratories). Slides were imaged on an ECHO microscope or a Zeiss LSM780 confocal. Tissue sections from Sox2Cre × YFPneo mice were used as positive controls. All animal experiments were performed in compliance with the guidelines set out by the Institutional Animal Care and Use Committee.
Supplementary Material
Appendix 01 (PDF)
Acknowledgments
This research was supported by grants from the US Department of Army, MURI award W911NF-14-1-0411 (M.L.), NIH, R35-GM122566-01 (M.L.), the NSF, DBI-2119963 (M.L.), and the National Institute on Aging, R01AG054641 NIA (M.L., J.-F.G., and M.V.).
Author contributions
M.K., B.A.B., Z.L., J.S., J.-F.G., and M.V. designed research; C.C., B.M.V., B.H., A.C., E.M., E.W., O.D.-S., J.L., M.-E.A., S.S., K.T., M.E., A.R., J.S. and M.V. performed research; J.S. contributed new reagents/analytic tools; X.Z., J.-F.G., and M.V. analyzed data; and M.L. and M.V. wrote the paper.
Competing interest
The authors declare no competing interest.
Footnotes
Reviewers: G.E.K., Western University of Health Sciences; and A.Y.M., Albert Einstein College of Medicine.
Data, Materials, and Software Availability
All the sequencing data generated from the H1 ESCs will be shared on the Sequence Read Archive (SRA), the primary NIH-funded archive for high throughput datasets. This data received the accession code BioProject ID PRJNA917136, and can be accessed at http://www.ncbi.nlm.nih.gov/bioproject/917136.
Supporting Information
References
- 1.van Leeuwen F. W., et al. , Frameshift mutants of beta amyloid precursor protein and ubiquitin-B in Alzheimer’s and Down patients. Science 279, 242–247 (1998). [DOI] [PubMed] [Google Scholar]
- 2.van Leeuwen F. W., Burbach J. P., Hol E. M., Mutations in RNA: A first example of molecular misreading in Alzheimer’s disease. Trends Neurosci. 21, 331–335 (1998). [DOI] [PubMed] [Google Scholar]
- 3.van Tijn P., et al. , Dose-dependent inhibition of proteasome activity by a mutant ubiquitin associated with neurodegenerative disease. J. Cell Sci. 120, 1615–1623 (2007). [DOI] [PubMed] [Google Scholar]
- 4.Verheijen B. M., Hashimoto T., Oyanagi K., van Leeuwen F. W., Deposition of mutant ubiquitin in parkinsonism-dementia complex of Guam. Acta Neuropathol. Commun. 5, 82 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.de Pril R., et al. , Accumulation of aberrant ubiquitin induces aggregate formation and cell death in polyglutamine diseases. Hum. Mol. Genet. 13, 1803–1813 (2004). [DOI] [PubMed] [Google Scholar]
- 6.Fischer D. F., et al. , Disease-specific accumulation of mutant ubiquitin as a marker for proteasomal dysfunction in the brain. FASEB J. 17, 2014–2024 (2003). [DOI] [PubMed] [Google Scholar]
- 7.Vermulst M., et al. , Transcription errors induce proteotoxic stress and shorten cellular lifespan. Nat. Commun. 6, 8065 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Saxowsky T. T., Meadows K. L., Klungland A., Doetsch P. W., 8-Oxoguanine-mediated transcriptional mutagenesis causes Ras activation in mammalian cells. Proc. Natl. Acad. Sci. U.S.A. 105, 18877–18882 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gout J. F., et al. , The landscape of transcription errors in eukaryotic cells. Sci. Adv. 3, e1701484 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gordon A. J., et al. , Transcriptional infidelity promotes heritable phenotypic change in a bistable gene network. PLoS Biol. 7, e44 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Consortium E. P., et al. , Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Acevedo A., Brodsky L., Andino R., Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature 505, 686–690 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Acevedo A., Andino R., Library preparation for highly accurate population sequencing of RNA viruses. Nat. Protoc. 9, 1760–1769 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chung C., et al. , BioProject ID PRJNA917136, SRA, http://www.ncbi.nlm.nih.gov/bioproject/917136, Accessed 12 December 2022.
- 15.Lynch M., et al. , Genetic drift, selection and the evolution of the mutation rate. Nat. Rev. Genet. 17, 704–714 (2016). [DOI] [PubMed] [Google Scholar]
- 16.Loh E., Salk J. J., Loeb L. A., Optimization of DNA polymerase mutation rates during bacterial evolution. Proc. Natl. Acad. Sci. U.S.A. 107, 1154–1159 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bar-Nahum G., et al. , A ratchet mechanism of transcription elongation and its control. Cell 120, 183–193 (2005). [DOI] [PubMed] [Google Scholar]
- 18.Kaplan C. D., The architecture of RNA polymerase fidelity. BMC Biol. 8, 85 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Isken O., Maquat L. E., Quality control of eukaryotic mRNA: Safeguarding cells from abnormal mRNA function. Genes Dev. 21, 1833–1856 (2007). [DOI] [PubMed] [Google Scholar]
- 20.Parsons M. A., Sinden R. R., Izban M. G., Transcriptional properties of RNA polymerase II within triplet repeat-containing DNA from the human myotonic dystrophy and fragile X loci. J. Biol. Chem. 273, 26998–27008 (1998). [DOI] [PubMed] [Google Scholar]
- 21.Strathern J., et al. , The fidelity of transcription: RPB1 (RPO21) mutations that increase transcriptional slippage in Saccharomyces cerevisiae. J. Biol. Chem. 288, 2689–2699 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhou Y. N., et al. , Isolation and characterization of RNA polymerase rpoB mutations that alter transcription slippage during elongation in Escherichia coli. J. Biol. Chem. 288, 2700–2710 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Long J. M., Holtzman D. M., Alzheimer disease: An update on pathobiology and treatment strategies. Cell 179, 312–339 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Braak H., Braak E., Neuropathological stageing of Alzheimer-related changes. Acta Neuropathol. 82, 239–259 (1991). [DOI] [PubMed] [Google Scholar]
- 25.Lee J. W., et al. , Editing-defective tRNA synthetase causes protein misfolding and neurodegeneration. Nature 443, 50–55 (2006). [DOI] [PubMed] [Google Scholar]
- 26.Ragagnin A., et al. , Cerebellar compartmentation of prion pathogenesis. Brain Pathol. 28, 240–263 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Barth T. K., Imhof A., Fast signals and slow marks: The dynamics of histone modifications. Trends Biochem. Sci. 35, 618–626 (2010). [DOI] [PubMed] [Google Scholar]
- 28.Veloso A., et al. , Rate of elongation by RNA polymerase II is associated with specific gene features and epigenetic modifications. Genome. Res. 24, 896–905 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Muniz L., Nicolas E., Trouche D., RNA polymerase II speed: A key player in controlling and adapting transcriptome composition. EMBO J. 40, e105740 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zemach A., McDaniel I. E., Silva P., Zilberman D., Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science 328, 916–919 (2010). [DOI] [PubMed] [Google Scholar]
- 31.Bregeon D., Doddridge Z. A., You H. J., Weiss B., Doetsch P. W., Transcriptional mutagenesis induced by uracil and 8-oxoguanine in Escherichia coli. Mol. Cell 12, 959–970 (2003). [DOI] [PubMed] [Google Scholar]
- 32.Viswanathan A., You H. J., Doetsch P. W., Phenotypic change caused by transcriptional bypass of uracil in nondividing cells. Science 284, 159–162 (1999). [DOI] [PubMed] [Google Scholar]
- 33.Fritsch C., et al. , Genome-wide surveillance of transcription errors in response to genotoxic stress. Proc. Natl. Acad. Sci. U.S.A. 118, e2004077118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Moynahan M. E., Chiu J. W., Koller B. H., Jasin M., Brca1 controls homology-directed DNA repair. Mol. Cell 4, 511–518 (1999). [DOI] [PubMed] [Google Scholar]
- 35.Zhang X., et al. , Attenuation of RNA polymerase II pausing mitigates BRCA1-associated R-loop accumulation and tumorigenesis. Nat. Commun. 8, 15908 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Svejstrup J. Q., The interface between transcription and mechanisms maintaining genome integrity. Trends Biochem. Sci. 35, 333–338 (2010). [DOI] [PubMed] [Google Scholar]
- 37.Aguilera A., Garcia-Muse T., R loops: From transcription byproducts to threats to genome stability. Mol. Cell 46, 115–124 (2012). [DOI] [PubMed] [Google Scholar]
- 38.Saxowsky T. T., Doetsch P. W., RNA polymerase encounters with DNA damage: Transcription-coupled repair or transcriptional mutagenesis? Chem. Rev. 106, 474–488 (2006). [DOI] [PubMed] [Google Scholar]
- 39.Shiromoto Y., Sakurai M., Minakuchi M., Ariyoshi K., Nishikura K., ADAR1 RNA editing enzyme regulates R-loop formation and genome stability at telomeres in cancer cells. Nat. Commun. 12, 1654 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bazak L., et al. , A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes. Genome Res. 24, 365–376 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Sun T., et al. , Decoupling expression and editing preferences of ADAR1 p150 and p110 isoforms. Proc. Natl. Acad. Sci. U.S.A. 118, e2021757118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Eggington J. M., Greene T., Bass B. L., Predicting sites of ADAR editing in double-stranded RNA. Nat. Commun. 2, 319 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Zhang M., et al. , RNA editing derived epitopes function as cancer antigens to elicit immune responses. Nat. Commun. 9, 3919 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Fritsch C., Gout J. P., Vermulst M., Genome-wide surveillance of transcription errors in eukaryotic organisms. J. Vis. Exp. JoVE 57731 (2018), 10.3791/57731. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix 01 (PDF)
Data Availability Statement
All the sequencing data generated from the H1 ESCs will be shared on the Sequence Read Archive (SRA), the primary NIH-funded archive for high throughput datasets. This data received the accession code BioProject ID PRJNA917136, and can be accessed at http://www.ncbi.nlm.nih.gov/bioproject/917136.





