Abstract
Post-transcriptional modifications are intrinsic to RNA structure and function. However, methods to sequence RNA typically require a cDNA intermediate and are either not able to sequence these modifications or are tailored to sequence one specific nucleotide modification only. Interestingly, some of these modifications occur with <100% frequency at their particular sites, and site-specific quantification of their stoichiometries is another challenge. Here, we report a direct method for sequencing tRNAPhe without cDNA by integrating a two-dimensional hydrophobic RNA end-labeling strategy with an anchor-based algorithm in mass spectrometry-based sequencing (2D-HELS-AA MS Seq). The entire tRNAPhe was sequenced and the identity, location, and stoichiometry of all eleven different RNA modifications was determined, five of which were not 100% modified, including a 2′-O-methylated G (Gm) in the wobble anticodon position as well as an N2, N2-dimethylguanosine (m22G), a 7-methylguanosine (m7G), a 1-methyladenosine (m1A), and a wybutosine (Y), suggesting numerous post-transcriptional regulations in tRNA. Two truncated isoforms at the 3′-CCA tail of the tRNAPhe (75 nt with a 3′-CC tail (80% abundance) and 74 nt with a 3′-C tail (3% abundance)) were identified in addition to the full-length 3′-CCA-tailed tRNAPhe (76 nt, 17% abundance). We discovered a new isoform with A─G transitions/editing at the 44 and 45 positions in the tRNAPhe variable loop, and discuss possible mechanisms related to the emergence and functions of the isoforms with these base transitions or editing. Our method revealed new isoforms, base modifications, and RNA editing as well as their stoichiometries in the tRNA that cannot be determined by current cDNA-based methods, opening new opportunities in the field of epitranscriptomics.
Graphical Abstract

As an essential component of protein synthesis machinery, tRNAs (tRNAs) are present in all living cells. More than 600 different tRNA sequences and a large breadth of different post-transcriptional base modifications have been reported.1,2 Despite their significance, structural and functional studies to understand the underlying biochemistry of tRNAs themselves have been hindered due to the lack of efficient tRNA sequencing methods. tRNAs are the only class of small cellular RNAs that cannot be efficiently sequenced with current sequencing techniques,3 even though the first tRNA was sequenced early in 1965.4 Typically, current methods used to sequence tRNAs are indirect and require complementary DNA (cDNA) intermediates. However, cDNA synthesis results in the loss of endogenous base modification information originally carried by RNAs. Thus, cDNA-based methods are not able to accurately sequence a complete set of the rich and dynamic base modifications in tRNAs which are an intrinsic part of their structure and function. Other methods that do not involve cDNA can detect base modifications, but these techniques usually require harsh sample treatments such as intensive enzymatic or chemical hydrolysis, resulting in loss of sequence ladders and spatial modification information.5-7
As opposed to indirect sequencing methods, direct sequencing of RNAs without cDNA synthesis would theoretically allow identification of both canonical and modified nucleotides in any RNA sample. Chemical and enzymatic degradation methods were previously developed to generate ladders for direct sequencing of RNA.8,9 However, these methods rely on different reactions specific to each of the canonical bases (A, C, G, and U) to generate sequence ladders as well as electrophoretic separation of the resultant ladders in the same gel for sequence determination, limiting their application primarily to sequencing canonical bases, and having very limited applications in identifying and locating various base modifications, e.g., those carried by tRNA.
One potential direct RNA sequencing method that can be generally applied to sequence both canonical and modified bases is mass spectrometry (MS)-based sequencing, which directly correlates the mass of each base, rather than proxies such as fluorescence10 or fluctuations in electric current,11 to sequence outputs. The four canonical ribonucleotides and most modified bases either have inherently unique masses or can be easily converted into different signature masses, allowing them to potentially be used for de novo MS-based direct RNA sequencing. Previously, top-down MS and tandem MS12-14 have been used in attempts to directly sequence tRNAs. However, these traditional MS methods have significant methodological inadequacies in the preparation of high-quality sequence ladders and subsequently in the reduction of complicated data, resulting in inaccurate base-calling.15,16 Therefore, a two-dimensional (2D) liquid chromatography (LC)-MS-based RNA sequencing method was established to produce easily identifiable mass-retention time (tR) ladders,16 allowing de novo sequencing of short single-stranded RNAs. We recently developed a hydrophobic end-labeling strategy (HELS) for 2D LC-MS-based RNA sequencing by introducing 2D mass-tR shifts for ladder identification of mixed RNAs,17 allowing for both the complete reading of a sequence from a single ladder and the sequencing of mixed RNA samples.16 Despite its success in model studies on short synthetic RNAs (~20 nt),17 it is still challenging to perform de novo MS sequencing of biological RNA such as tRNAs due to the data complexity caused by the increased length (typically ~76–90 nt) and abundance of base modifications. It is critical for the method to first demonstrate its feasibility to sequence real biological samples beyond previously assayed synthetic RNAs, as this will be a significant step forward to expand its applications in sequencing broader biological RNAs in the future. This helps to address the urgent demand in the field of epitranscriptomics for new technological development that can sequence all kinds of RNA modifications,18,19 not just for a specific modification or a few modifications.20
To process complex MS data derived from biological samples and to enable the method to sequence tRNAs, we developed a computational anchor-based algorithm which innovatively accomplishes automated MS sequencing of RNAs. The signature tR-mass value of the hydrophobic tag specifies the exact starting data point, the anchor, for the algorithm to accurately determine data points corresponding to the desired ladder fragments needed for sequencing, significantly simplifying data reduction and enhancing the accuracy of sequence generation. The idea of using an anchor to identify sequence ladder start-points can be generalized and extended to any known chemical moiety beyond hydrophobic tags, e.g., PO4−, at the beginning of the tRNA or any nucleotide with a known mass. We can program its mass as a tag mass and use our anchor algorithm for sequencing, addressing the issue of MS data complexity of biological RNA samples and making 2D-HELS MS Seq more robust and accurate. Here, we integrated 2D-HELS with the anchor-based algorithm into mass spectrometry-based sequencing (2D-HELS-AA MS Seq), and successfully applied the method to sequence a complete yeast tRNAphe, including all eleven different RNA modifications. Interestingly, some of these modifications occur with <100% frequency at their particular sites, and site-specific quantification of their stoichiometries is another challenge.19 In addition to sequencing, we also quantified the base modification stoichiometries of all eleven modifications together in a single study for the first time, and showed that five modifications are not 100% modified at their particular sites of the tRNA. While directly sequencing the tRNA and its modifications, the new method also revealed two 3′-tail-truncated isoforms besides the full-length 3′-CCA-tailed tRNAPhe and helped to discover a new base transition isoform with A─G transitions/editing at the 44 and 45 positions in the variable loop, demonstrating another aspect of the method’s efficacy in detection, sequencing, and quantification of RNA isoforms. Our technological advance promotes better understanding of functions of post-transcriptional modifications and isoforms including their correlations to human diseases.
RESULTS AND DISCUSSION
Development of an Anchor-Based Algorithm for 2D-HELS-AA MS Seq.
To extend the application of the 2D-HELS MS Seq approach from short synthetic RNAs17 to allow sequencing of tRNA, we developed a computational anchor-based algorithm to automate MS sequencing of RNAs. Due to the complexity of MS data derived from the tRNA, it is very challenging to process all data in a single LC-MS run simultaneously. Instead, data preprocessing was used to select a particular subset of the input data set for the algorithm to focus on initially. This is feasible because a hydrophobic tag was added to the terminus of each RNA to be sequenced, where it remained even after acid degradation. Additionally, the trends of tR and mass of the tag-containing ladder fragments are known from our previous studies.16,17 In the 2D mass-tR plot of output LC-MS data sets, data points corresponding to tag-labeled RNA fragments are shifted spatially to a zone with larger tRs than those of their unlabeled counterparts, due to the tag’s hydrophobicity. Therefore, the algorithm can “zoom in” on one group, either labeled or unlabeled, in its specific zone of the 2D-plot, to read out the sequence of the selected group first. As such, we call the algorithm “anchor-based”, since it specifies the starting data point corresponding to the terminal tag, which latches down the data points corresponding to the specific ladder fragments that we aim to read out from the whole data set. The anchor-based algorithm significantly simplified the complicated MS data from the tRNA sample because it only read out the sequence for ladder fragments that had a hydrophobic tag or a specified tag with a known mass, and selectively filtered all non-tag/anchor-related data points out of the complicated MS data derived from the tRNA sample. More details related to the anchor algorithm are described in Supporting Information.
2D-HELS-AA MS Seq of Yeast tRNA.
As it was only possible to read segments of up to 35 nt long with a 40K mass resolution LC-MS,17 we incorporated a partial RNase T1 digestion step to sequence a tRNA that was commercially available, resulting in a reduction of the 76 nt tRNA to segments of sequenceable sizes for 2D-HELS-AA MS Seq. Subsequently, we directly sequenced the entire tRNA with single-base resolution in one single LC-MS run (Figure 1). To further verify the complete tRNA sequence obtained from the single run above, we labeled the three segments partially digested from the tRNA by RNase T1 and separated them one by one for 2D-HELS-AA MS Seq in three separate LC-MS runs (Figure S1). To obtain overlapping segment sequences for assembling the complete tRNA sequence, we included MS data of the tRNA generated without RNase T1 digestion, i.e., 31 nt of the tRNA read from the 5′-end using a phosphate (PO4−) as the 5′-anchor, and 32 nt of the tRNA read from its 3′-end using a CCA tag as the 3′-anchor, respectively (Figure 1C). Taking all draft reads output by the anchor-based algorithm together (Table S1-S11), we assembled a full-length tRNA sequence which was a 100% match to the tRNAPhe reference sequence with more than 2× coverage (Figure 1C).
Figure 1.
2D-HELS-AA MS Seq of Yeast tRNAPhe. (A) (1–6): Sequencing workflow. More detailed information can be found in Materials and Methods in Supporting Information. (B) 2D plot of the entire tRNA sequenced from a single LC-MS run, showing the identity and location of all modifications. (C) Assembly of the full-length tRNAPhe sequence based on overlapping sequence reads from different LC-MS runs, showing 100% coverage and accuracy as compared to the reported tRNAPhe reference sequence. All output sequence reads are converted to FASTA format in the 5′- to 3′-order (44 and 45 AG conversion output reads not included). *Ts: the Supporting Information Table S where the sequencing data of that particular strand can be found.
Sequencing of All Eleven RNA Modifications.
During sequencing of the tRNA, we also successfully identified and located all eleven RNA modifications within the tRNA (Figure 2). Four of these modifications could be directly read out by their unique masses: dihydrouridine (D) at positions 16 and 17, N2N2-dimethylguanosine (m22G) at position 26, 5-methylcytidine (m5C) at position 40, and 5-methyluridine (T) at position 54. Methylation on the 2′-OH of C (Cm) at position 32 and G (Gm) at position 34 renders the adjacent 3′5′ phosphodiester linkage nonhydrolyzable, creating a mass gap in both the 5′- and the 3′-mass ladder families larger than 1 nt16 (Figure 1B). This gap can be filled in by collision induced dissociation (CID) MS, which determines which of the two unhydrolyzable nucleotides is methylated16 (Figure 2C and Figure S2). However, other RNA modifications such as pseudouridine (ψ) and U, N2-methylguanosine (m2G) and 7-methylguanosine (m7G), and 1-methyladenosine (m1A) and N6-methyladenosine (m6A) share identical masses, and LC-MS alone cannot distinguish them. Additional enzymatic/chemical reactions were required to identify each base at their particular sites and differentiate them from their corresponding isomers with an identical mass, as shown in the Figure 2C. To differentiate m1A at position 58 from its m6A isomer,21 we designed a reverse transcription/single base extension experiment (rtSBE), which indicates that m6A, but not m1A, is able to form base-pairing interactions, thus causing a pause during reverse transcription at any m1A.22 The rtSBE results proved that the nucleotide at position 58 is m1A and not m6A (Figure S3). The demethylation experiment which employed ALKBH3, a m1A and m3C demethylase of tRNA,21 to convert m1A to A in tRNAPhe followed by incorporation of ddT based on a positive MALDI result further confirmed that the nucleotide at position 58 is m1A. In the absence of ALKBH3, we did not observe the ddT incorporation. To differentiate ψ from U, we treated the RNA with N-cyclohexyl-N′-(2-morpholinoethyl)-carbodiimide metho-p-toluenesulfonate (CMC) to convert ψ to its CMC adduct,23 which has a different mass than U/ψ. The CMC-converted ψ (depicted as ψ*) results in a shift in both tR and mass, allowing facile identification and location of ψ at positions 39 and 55 due to a single drastic shift each in the mass-tR ladder at these sites17 (Figure S4 and Tables S12-S17). To differentiate m7G at position 46 from its isomeric m2G at position 10, we treated the tRNA with borohydride (NaBH4) and aniline sequentially to generate a site-specific cleavage right after m7G.24,25 The observed three major mass fragments after the cleavage measured by LC-MS were all a result of cleavage at m7G, but in three isoforms with 3′-tails of C, CC, or CCA, respectively (Figure S5), indicating that there is only one m7G in tRNA. We did not observe a mass fragment induced by a cleavage at m2G at position 10 from either the 5′-end or the 3′-end. However, the mass fragment from the 5′-end to m7G at position 46 after the cleavage was not observed (45 nt long), probably due to mass resolution limitation.17 Otherwise, no other mass fragments were observed. The unique masses of the cleaved 5′-segments were used to differentiate m7G at position 46 from m2G at position 10, which cannot be cleaved at the same reaction conditions.
Figure 2.
Sequencing of all eleven RNA modifications. (A) Proposed mechanism for the conversion of wybutosine (Y) to its depurinated form (Y′) in acidic conditions. (B) The mass of Y was found in the crude products after acid degradation. The relative percentages of Y and Y′ were quantified and can be found in Table S18. (C) Summary of all eleven RNA modifications sequenced by 2D-HELS-AA MS Seq. The relative percentages of modifications at each position were quantified by integrating the EIC peaks of their corresponding ladder fragments (Table S19). The percentages of partially modified nucleotides are highlighted in pink.
The primary task for sequencing is to determine the precise order of the four nucleotides. Our method thus extends this capacity to include nucleotide modifications beyond the four canonical nucleotides, based on the unique mass of each RNA modification, and we used this approach to expand the application of the technique beyond the synthetic RNA samples we examined previously, to directly sequence biological samples for the first time. Only in the case where modifications have isomers with identical masses but different chemical structures would one require a further RNA modification characterization method to differentiate these isomers following our 2D-HELS-AA MS Seq approach. However, the advantage of our method is that we already know the mass of the particular nucleotide modification and its location/order without any prior sequence knowledge. This is very different than other RNA characterizing methods that can identify RNA modifications, but must still rely on additional established sequencing methods for sequence/location information.18,26-28
Stoichiometric Quantification of All Eleven RNA Modifications.
Relative stoichiometries/percentages of modified RNA vs nonmodified counterpart RNA can be quantified in partially modified synthetic RNA samples by our technique,17 and thus stoichiometries/relative percentages of all eleven RNA modifications were quantified at each position of the tRNA (Table S19), five of which were not 100% modified (Figure 2C). The data suggest that there is an abundance of post-transcriptional regulation that can occur in the tRNA at these different positions. For example, the wobble Gm at position 34 was partially modified (60% Gm vs 40% G), which has important regulatory implications since a lack of Gm could affect binding or stalling in the ribosome.29 2′-O-Methylation is essential for accurate and efficient protein synthesis, and a decreased level of 2′-O-methylation could lead to an increase in translational infidelity.30,31
Our method also revealed unexpected nucleotides in tRNA. Position 26 in tRNAPhe is thought to be m22G;32-34 however, we found clear evidence that G coexists at this position, but there is no evidence for any monomethyled G (mG) coexisting at this position. The stoichiometries were quantified by integrating extracted-ion current (EIC) peaks of their corresponding ladder fragments,17,35 which revealed that m22G and G were present at 58% and 42%, respectively (Figure 2C). Also, both m7G at position 46 (46% m7G vs 54% G) in the variable loop and m1A at position 58 (94% m1A vs 6% A) in the TψC loop were partially modified (Figure 2C), suggesting that the methylation process is highly regulated.36 To the best of our knowledge, this is the first time the stoichiometry, identity, and location of these different RNA modifications were all directly measured together in a single study, something no currently available sequencing technologies are capable of, thus providing unique insights that call for further functional studies of these dynamic RNA modifications.37
Identification and Quantification of a Dynamic Change from Y to Its Depurinated Y′ Form.
Upon analysis of the sequencing results, we noticed that the wybutosine (Y) at position 37 was converted to its depurinated product Y′ (ribose form) under acidic degradation conditions (Figure 2).38,39 Without acid degradation, only 10% of the tRNA contained the depurinated Y′ form at this position, while 90% contained the standard Y form of the base (Table S18). However, no Y form was observed in any ladder fragments containing this position after acid degradation, and all of the Y bases were converted to Y′ due to depurination in the acidic conditions (Figure 2A). As another piece of evidence of the depurination, a mass of 376.1178 Da, corresponding to a cleaved Y nucleobase, was found in the crude products after acid degradation and subsequent MS analysis (Figure 2B), suggesting that Y′ was originally carried by the tRNA. The fact that our method can identify the dynamic change of Y to Y′ and quantify the relative Y/Y′ ratio could be useful for potential diagnostic assays, as such changes in the Y′/Y ratio could be used as a potential biomarker, e.g., in certain nervous system diseases,40 where the common characteristics are decreased pH at both the tissue and cellular levels. Based on the same principle, our method could potentially probe dynamic changes of other base modifications, acid-labile or not, and quantify variations in their ratios, in particular, cells or tissues subjected to different biological processes or disease conditions.
Identification and Quantification of Two Other Truncation Isoforms (74 nt and 75 nt) at the 3′-End.
Unlike its nominal identity according to the supplier, upon sequencing, the commercially prepared tRNAPhe (phenylalanine specific from brewer’s yeast) sample was revealed to be heterogeneous. When analyzing a biotinylated 3′-segment of the tRNA (58m1A-76A), we found that there is more than one ladder that has the biotin tag as shown in Figure 3A, indicating that this segment contains more than one sequence. Besides the 76 nt tRNA with a complete post-transcriptionally modified CCA tail, two other incomplete isoforms of the tRNA that are missing an A and a CA at the 3′-CCA tail, respectively, were further identified in a 3′-segment of the tRNA (58m1A-76A) (Figure 3) using the anchor algorithm and a revised Smith-Waterman alignment similarity algorithm (details in Materials and Methods). Surprisingly, the most abundant component was not the nominal 76 nt tRNAPhe, which comprised only 17% of the sample as calculated by integration of the corresponding EIC peak (Table S24). Rather, the 75 nt tRNAPhe with a missing A at the 3′-end was the major component of the sample at 80%, while the 74 nt tRNAPhe with a missing CA at the 3′-end was a minor component at 3%. The two tail-truncation isoforms cannot be degraded products of longer tRNAs like the 76 nt tRNAPhe; otherwise, they would not contain the free 3′-OH required for the 2D HELS chemistry.17 The data suggest that 2D-HELS MS Seq is not only able to sequence modified RNA, but can also identify tail-truncation isoforms that were primarily only studied by polyacrylamide gel electrophoresis methods previously.41 As stress-induced tRNA fragmentation has been implicated in cancers and other diseases,42 further studies into the relationship between the relative abundances of tRNA tail-truncation isoforms and various diseases will assist in understanding the potential role of such isoforms in disease-related biological processes and subsequent treatments.43
Figure 3.
Identification of 3′-truncation isoforms. (A) 2D-HELS-AA MS sequencing of segment III, showing two other truncated isoforms of tRNAPhe at the 3′-end (74 nt and 75 nt). tR was normalized for ease of visualization of the 74 nt and 75 nt isoforms. (B) Terminal base of 76 nt tRNAPhe and its two tail-truncated isoforms; all three isoforms contain a free OH at the 3′-end, which is required for introducing the biotin tag, suggesting that the isoforms were not generated during acid degradation but came together with the full-length 76 nt tRNA originally.
Discovering a New 44g45a Isoform at the tRNA’s Variable Loop.
We also observed a new isoform with an A to G transition at position 44 and a G to A transition at position 45, i.e., a 44A45G (wild-type, reported previously)44 to 44g45a transition. Please note that the lower-case letters “g” and “a” in the isoform “44g45a” are used to represent the isomeric nucleotide that shares an identical mass with the canonical nucleotides G and A, respectively, but their exact structures remain to be confirmed. These two reads were revealed first by our anchor-based algorithm, and further verified manually in the original MFE files (Figure 4, Tables S4-S5, S8-S9, and S19-S22). We identified two distinct mass ladder fragments at position 44 when reading from the 5′-direction, apparently corresponding to sequences containing both 44A and 44g being simultaneously present. However, these two mass ladder fragments merged into one mass ladder fragment at position 45. Such an effect could only occur if two coexisting sequences contained a 45G or a 45a, respectively, thus confirming the coexistence of two coexisting isoforms (Figure 4A). This is consistent with the sequencing results when reading from the opposite direction when we performed bidirectional sequencing16 (Figure 4C). We observed these two isoforms in all reads which covered positions 44 and 45, and their relative percentages were consistent (~50% for wild-type, quantified by EIC) (Table S25). To further verify the coexistence of the two mass fragments, we employed full-spectral analysis provided by the commercial MassWorks software (Cerno Bioscience, Las Vegas, USA) to examine the corresponding ions of these two fragments simultaneously in one spectrum. When reading from the 5′-direction, two ions (m/z 778.1051 and 779.7068, both with 10 charge states) were found, corresponding to 44A and 44g. Full-spectral analysis also confirmed that 45G and 45a coexist when reading from the 3′-direction (Figure 4D). Furthermore, the ratios of 44A/44g as compared to 45G/45a quantified by the full spectral analysis45 are consistent (Figure 4), indicating that the sequenced 44g and the 45a are indeed from the same RNA strand, while the 44A and 45G are also both from the same RNA strand. All these MS results support the existence of a new isoform, with the sequence 44g45a, coexisting with the wild-type RNA that contains the 44A45G sequence, and that these two isoforms occur at similar levels. To further confirm the coexistence of these two isoforms, we performed an rtSBE on the tRNAPhe sample. For example, if tRNAPhe has an A/g single-nucleotide polymorphism (SNP) at position 44, then the rtSBE assay would be able to incorporate both ddT and ddC, since the two isoforms exist at similar levels. However, the results showed that only ddT could be incorporated at position 44 (Figure S6A) and only ddC could be incorporated at position 45 (Figure S6B), indicating that the wild-type 44A45G was the only isoform present. The rtSBE results suggested that RNA reverse transcriptase could not recognize these edited bases well. It is also possible that the mass differences observed in the above A─G transitions at positions 44 and 45 may be caused by oxidation and reduction, e.g., oxidation of A to 2-oxoadenine (isoG) and/or 8-oxoadenine (8-oxo-A) at position 44 (Figure S7A), which both have a mass identical to G and would still allow canonical T incorporation. Complete acid digestion of the tRNA into single nucleotides followed by LC-MS analysis supports this possibility, as we found two different tRs in the EIC profile of the G monophosphate (Figure S7B), suggesting a coexisting nucleotide of the same mass as G, but a different structure. A similar mechanism could explain the putative G to A transition/editing at position 45. These findings raise interesting questions about how these newly identified isoforms emerge and affect the function of the tRNA.
Figure 4.
Discovering a new 44g45a isoform in the tRNA variable loop. (A) Schematic of sequence ladder fragments shows a transition/editing g (sharing an identical mass as G) coexists with A at position 44 when reading from the 5′-direction (Tables S4-S5 and S8-S9). (B) Least squares fitted mass spectrum to the calibrated mass spectrum tR = 31.9–32.9 min) when reading from the 5′-direction. The full-spectral analysis confirms that the ions of the 44g and 44A fragments (with 10 charges) coexist and their relative abundances are 57% and 43%, respectively. The theoretical trace (black) of the two combined ion profiles fits well with the calibrated mass spectrum as observed (red), resulting in a good spectral accuracy of 87%. (C) Single transition/edited a (one oxygen less than G) coexists with G at position 45 when reading from the 3′-direction (Tables S19-S22). (D) Similar to B, full-spectral analysis confirms that the ions of the 45a and 45G fragments (both with four charges; spectral accuracy: 71%) also coexist and their relative abundances are 47% and 53%, respectively, when reading from the 3′-direction (tR = 16.5–18.6 min).
CONCLUSION
The 2D-HELS-AA MS Seq technique expands RNA sequencing capacity beyond the four canonical ribonucleotides, and is able to determine the precise order of both canonical and nucleotide modifications potentially including any modification that an LC-MS instrument can detect. Unlike other successful sequencing technologies, we rely on mass differences of two adjacent ladder fragments to report identities of both canonical nucleotides and chemical modifications. Mass is an intrinsic nucleotide property that can be used to identity both known and unknown RNA modifications. This is very different than the use of proxies such as fluorescence or electronic signatures to report the identity of the four canonical nucleotides, which has limited capacity in discovering new and unknown base modifications. It is worth emphasizing that our method is a sequencing method, which includes both identification and location information on each nucleotide, canonical or not. This is very different than other RNA identification/characterization methods, which can only indicate the identity of RNA modifications but must rely on complementary established sequencing methods for sequence/location information. The primary purpose of the current work is to expand the sequencing capacity of this approach beyond the synthetic RNAs we reported on previously17 to achieve direct and de novo sequencing of biological RNA molecules like tRNAPhe. Further characterization of RNA modifications was only needed when there were isomeric modifications that could not be differentiated by mass alone. We do not claim that our method can replace standard structural verification methods such as NMR, X-ray crystallography, and other chemical and enzymatic approaches that are specific to individual nucleotide modifications, which are designed to assess the chemical structure of such base modifications. Rather, these reliable methods are essential to further confirm the exact chemical structures of nucleotide modifications that we have revealed initially by their unique masses, such as isomeric base modifications.
Chemically, all RNAs consist of phosphodiester bonds that can be cleaved to generate mass ladders for our 2D-HELS-AA MS Seq. In this seminal study, the focus was to demonstrate that the approach is not limited to short synthetic RNAs (<35 nt) as described previously;17 we can indeed use it to sequence real biological samples such as tRNAs. However, in practice, the types of RNA that can be sequenced by this method are not only determined by our acid degradation chemistry for mass ladder generation, but by the capacity of LC-MS instrument to detect these mass ladders as well. The upper limit of RNA size that will give adequate resolution is LC-MS instrument-dependent, and the lower limit of the RNA sample loading amount is also instrument-sensitive. Both limits remain to be determined and will affect the utility of the approach. However, we are aiming to develop a general method that every user can tailor to their own instruments. Clearly, higher end LC-MS instruments provide higher mass resolutions (likely leading to higher read length) and/or higher sensitivity (likely leading to lower sample requirement). Once the method is fully developed, it will not be necessary for every end user to have a top-of-the-line instrument, since almost certainly companies offering the service will emerge, similar to many current vendors that provide NGS services. Nonetheless, the results of the 2D-HELS-AA MS Seq study revealed new isoforms, RNA base modifications, and editing, as well as their stoichiometries, in the tRNA that cannot be determined by cDNA-based methods (Figure 5), opening new opportunities in the field of epitranscriptomics.
Figure 5.
Summary of different RNA isoforms, base modifications, and base editing, as well as their stoichiometries, in the tRNAphe.
Supplementary Material
ACKNOWLEDGMENTS
The authors acknowledge an R21 grant from the USA National Institutes of Health (R21HG009576) to S. Zhang and W. Li and New York Institute of Technology (NYIT) Institutional Support for Research and Creativity grants to S. Zhang, which supported this work. The authors would also like to thank M. Hadjiargyrou (NYIT); J. Ju (Columbia University); S. Kumar, X. Li, S. Jockusch, and other members of the Ju lab (Columbia University); Y. Wang (Cerno Bioscience); and M. Aziz (NYIT) for helpful discussions and suggestions for our manuscript. T. Z. Jia is a researcher at the Earth-Life Science Institute (ELSI) at Tokyo Institute of Technology, which is supported under the World Premier International Research Center (WPI) Initiative of the Japan Ministry of Education, Culture, Sports, Science and Technology.
Footnotes
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acschembio.0c00119.
Materials and Methods with associated figures and data tables (PDF)
The authors declare the following competing financial interest(s): The authors have filed a provisional patent related to the technology discussed in this manuscript.
Data and Code Availability: Related MFE data and the anchor-based algorithm (including both the web-based sequencing application and the source code) are available upon request and were uploaded to a separate server at Github (https://github.com/rnamodifications/seqapp).
Contributor Information
Ning Zhang, Department of Biological and Chemical Sciences, New York Institute of Technology, New York, New York 10023, United States; Department of Chemical Engineering, Columbia University, New York, New York 10027, United States.
Shundi Shi, Department of Chemical Engineering, Columbia University, New York, New York 10027, United States.
Xuanting Wang, Department of Chemical Engineering, Columbia University, New York, New York 10027, United States.
Wenhao Ni, Department of Biological and Chemical Sciences, New York Institute of Technology, New York, New York 10023, United States.
Xiaohong Yuan, Department of Biological and Chemical Sciences, New York Institute of Technology, New York, New York 10023, United States.
Jiachen Duan, Department of Biological and Chemical Sciences, New York Institute of Technology, New York, New York 10023, United States.
Tony Z. Jia, Earth-Life Science Institute, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan; Blue Marble Space Institute of Science, Seattle, Washington 98154, United States.
Barney Yoo, Department of Chemistry, Hunter College, City University of New York, New York, New York 10065, United States.
Ashley Ziegler, Department of Biological and Chemical Sciences, New York Institute of Technology, New York, New York 10023, United States.
James J. Russo, Department of Chemical Engineering, Columbia University, New York, New York 10027, United States
Wenjia Li, Department of Computer Science, New York Institute of Technology, New York, New York 10023, United States.
Shenglong Zhang, Department of Biological and Chemical Sciences, New York Institute of Technology, New York, New York 10023, United States.
REFERENCES
- (1).Lorenz C, Lunse CE, and Morl M (2017) tRNA Modifications: Impact on Structure and Thermal Adaptation. Biomolecules 7, E35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Bou-Nader C, Montemont H, Guerineau V, Jean-Jean O, Bregeon D, and Hamdane D (2018) Unveiling structural and functional divergences of bacterial tRNA dihydrouridine synthases: perspectives on the evolution scenario. Nucleic Acids Res. 46, 1386–1394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Zheng G, Qin Y, Clark WC, Dai Q, Yi C, He C, Lambowitz AM, and Pan T (2015) Efficient and quantitative high-throughput tRNA sequencing. Nat. Methods 12, 835–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Holley RW, Apgar J, Everett GA, Madison JT, Marquisee M, Merrill SH, Penswick JR, and Zamir A (1965) Structure of a Ribonucleic Acid. Science 147, 1462–1465. [DOI] [PubMed] [Google Scholar]
- (5).RajBhandary UL, and Kohrer C (2006) Early days of tRNA research: discovery, function, purification and sequence analysis. J. Biosci 31, 439–451. [DOI] [PubMed] [Google Scholar]
- (6).Carell T, Brandmayr C, Hienzsch A, Muller M, Pearson D, Reiter V, Thoma I, Thumbs P, and Wagner M (2012) Structure and function of noncanonical nucleobases. Angew. Chem., Int. Ed 51, 7110–7131. [DOI] [PubMed] [Google Scholar]
- (7).Kellner S, Burhenne J, and Helm M (2010) Detection of RNA modifications. RNA Biol. 7, 237–247. [DOI] [PubMed] [Google Scholar]
- (8).Peattie DA (1979) Direct chemical method for sequencing RNA. Proc. Natl. Acad. Sci. U. S. A 76, 1760–1764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).Guo PX, Bailey S, Bodley JW, and Anderson D (1987) Characterization of the small RNA of the bacteriophage phi 29 DNA packaging machine. Nucleic Acids Res. 15, 7081–7090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Ozsolak F, Platt AR, Jones DR, Reifenberger JG, Sass LE, McInerney P, Thompson JF, Bowers J, Jarosz M, and Milos PM (2009) Direct RNA sequencing. Nature 461, 814–818. [DOI] [PubMed] [Google Scholar]
- (11).Garalde DR, Snell EA, Jachimowicz D, Sipos B, Lloyd JH, Bruce M, Pantic N, Admassu T, James P, Warland A, Jordan M, Ciccone J, Serra S, Keenan J, Martin S, McNeill L, Wallace EJ, Jayasinghe L, Wright C, Blasco J, Young S, Brocklebank D, Juul S, Clarke J, Heron AJ, and Turner DJ (2018) Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206. [DOI] [PubMed] [Google Scholar]
- (12).Taucher M, and Breuker K (2012) Characterization of modified RNA by top-down mass spectrometry. Angew. Chem., Int. Ed 51, 11289–11292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Huang TY, Liu J, and McLuckey SA (2010) Top-down tandem mass spectrometry of tRNA via ion trap collision-induced dissociation. J. Am. Soc. Mass Spectrom 21, 890–898. [DOI] [PubMed] [Google Scholar]
- (14).Suzuki T, and Suzuki T (2014) A complete landscape of post-transcriptional modifications in mammalian mitochondrial tRNAs. Nucleic Acids Res. 42, 7346–7357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Thomas B, and Akoulitchev AV (2006) Mass spectrometry of RNA. Trends Biochem. Sci 31, 173–181. [DOI] [PubMed] [Google Scholar]
- (16).Björkbom A, Lelyveld VS, Zhang S, Zhang W, Tam CP, Blain JC, and Szostak JW (2015) Bidirectional direct sequencing of noncanonical RNA by two-dimensional analysis of mass chromatograms. J. Am. Chem. Soc 137, 14430–14438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Zhang N, Shi S, Jia TZ, Ziegler A, Yoo B, Yuan X, Li W, and Zhang S (2019) A general LC-MS-based RNA sequencing method for direct analysis of multiple-base modifications in RNA mixtures. Nucleic Acids Res. 47, e125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (18).Chi KR (2017) The RNA code comes into focus. Nature 542, 503–506. [DOI] [PubMed] [Google Scholar]
- (19).Harcourt EM, Kietrys AM, and Kool ET (2017) Chemical and structural effects of base modifications in messenger RNA. Nature 541, 339–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).Khoddami V, Yerra A, Mosbruger TL, Fleming AM, Burrows CJ, and Cairns BR (2019) Transcriptome-wide profiling of multiple RNA modifications simultaneously at single-base resolution. Proc. Natl. Acad. Sci. U. S. A 116, 6784–6789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).Chen Z, Qi M, Shen B, Luo G, Wu Y, Li J, Lu Z, Zheng Z, Dai Q, and Wang H (2019) Transfer RNA demethylase ALKBH3 promotes cancer progression via induction of tRNA-derived small RNAs. Nucleic Acids Res. 47, 2533–2545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).Motorin Y, Muller S, Behm-Ansmant I, and Branlant C (2007) Identification of modified residues in RNAs by reverse transcription-based methods. Methods Enzymol 425, 21–53. [DOI] [PubMed] [Google Scholar]
- (23).Bakin A, and Ofengand J (1993) Four newly located pseudouridylate residues in Escherichia coli 23S ribosomal RNA are all at the peptidyltransferase center: analysis by the application of a new sequencing technique. Biochemistry 32, 9754–9762. [DOI] [PubMed] [Google Scholar]
- (24).Wintermeyer W, and Zachau HG (1970) Specific Chemical Chain Scission of Transfer Rna at 7-Methylguanosine. FEBS Lett. 11, 160–164. [DOI] [PubMed] [Google Scholar]
- (25).Marchand V, Ayadi L, Ernst FGM, Hertler J, Bourguignon-Igel V, Galvanin A, Kotter A, Helm M, Lafontaine DLJ, and Motorin Y (2018) AlkAniline-Seq: Profiling of m(7)G and m(3)C RNA Modifications at Single Nucleotide Resolution. Angew. Chem., Int. Ed 57, 16785–16790. [DOI] [PubMed] [Google Scholar]
- (26).Sakurai M, and Suzuki T (2011) Biochemical identification of A-to-I RNA editing sites by the inosine chemical erasing (ICE) method. Methods Mol. Biol 718, 89–99. [DOI] [PubMed] [Google Scholar]
- (27).Dominissini D, Moshitch-Moshkovitz S, Schwartz S, Salmon-Divon M, Ungar L, Osenberg S, Cesarkas K, Jacob-Hirsch J, Amariglio N, Kupiec M, Sorek R, and Rechavi G (2012) Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature 485, 201–206. [DOI] [PubMed] [Google Scholar]
- (28).Meyer KD, Saletore Y, Zumbo P, Elemento O, Mason CE, and Jaffrey SR (2012) Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons. Cell 149, 1635–1646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (29).Vendeix FA, Dziergowska A, Gustilo EM, Graham WD, Sproat B, Malkiewicz A, and Agris PF (2008) Anticodon domain modifications contribute order to tRNA for ribosome-mediated codon binding. Biochemistry 47, 6117–6129. [DOI] [PubMed] [Google Scholar]
- (30).Erales J, Marchand V, Panthu B, Gillot S, Belin S, Ghayad SE, Garcia M, Laforêts F, Marcel V, Baudin-Baillieu A, et al. (2017) Evidence for rRNA 2′-O-methylation plasticity: Control of intrinsic translational capabilities of human ribosomes. Proc. Natl. Acad. Sci. U. S. A 114, 12934–12939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).McCown PJ, Ruszkowska A, Kunkler CN, Breger K, Hulewicz JP, Wang MC, Springer NA, and Brown JA (2020) Naturally occurring modified ribonucleosides. Wiley Interdiscip. Rev. RNA, e1595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Byrne RT, Konevega AL, Rodnina MV, and Antson AA (2010) The crystal structure of unmodified tRNA Phe from Escherichia coli. Nucleic Acids Res. 38, 4154–4162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Edqvist J, Grosjean H, and Stråby KB (1992) Identity elements for N2-dimethylation of guanosine-26 in yeast tRNAs. Nucleic Acids Res. 20, 6575–6581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Bavi RS, Sambhare SB, and Sonawane KD (2013) MD simulation studies to investigate iso-energetic conformational behaviour of modified nucleosides m2G and m22G present in tRNA. Comput. Struct. Biotechnol. J 5, e201302015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Zhang SL, Blain JC, Zielinska D, Gryaznov SM, and Szostak JW (2013) Fast and accurate nonenzymatic copying of an RNA-like synthetic genetic polymer. Proc. Natl. Acad. Sci. U. S. A 110, 17732–17737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Wang X, and He C (2014) Dynamic RNA modifications in posttranscriptional regulation. Mol. Cell 56, 5–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).Meyer KD, and Jaffrey SR (2014) The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nat. Rev. Mol. Cell Biol 15, 313–326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (38).RajBhandary UL, Faulkner RD, and Stuart A (1968) Studies on polynucleotides. LXXIX. Yeast phenylalanine transfer ribonucleic acid: products obtained by degradation with pancreatic ribonuclease. J. Biol. Chem 243, 575–583. [PubMed] [Google Scholar]
- (39).Ladner JE, and Schweizer MP (1974) Effects of dilute HCl on yeast tRNAPhe and E. coli tRNA1fMet. Nucleic Acids Res. 1, 183–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (40).Fang B, Wang D, Huang M, Yu G, and Li H (2010) Hypothesis on the relationship between the change in intracellular pH and incidence of sporadic Alzheimer’s disease or vascular dementia. Int. J. Neurosci 120, 591–595. [DOI] [PubMed] [Google Scholar]
- (41).Merryman C, Weinstein E, Wnuk SF, and Bartel DP (2002) A bifunctional tRNA for in vitro selection. Chem. Biol 9, 741–746. [DOI] [PubMed] [Google Scholar]
- (42).Thompson DM, and Parker R (2009) Stressing Out over tRNA Cleavage. Cell 138, 215–219. [DOI] [PubMed] [Google Scholar]
- (43).Hou YM (2010) CCA addition to tRNA: implications for tRNA quality control. IUBMB Life 62, 251–260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (44).Alzner-DeWeerd B, Hecker LI, Barnett WE, and RajBhandary UL (1980) The nucleotide sequence of phenylalanine tRNA from the cytoplasm of Neurospora crassa. Nucleic Acids Res. 8, 1023–1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).Wang Y, and Gu M (2010) The concept of spectral accuracy for MS. Anal. Chem 82, 7055–7062. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





