Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Oct 1;15(10):e0239287. doi: 10.1371/journal.pone.0239287

RNAthor – fast, accurate normalization, visualization and statistical analysis of RNA probing data resolved by capillary electrophoresis

Julita Gumna 1, Tomasz Zok 2, Kacper Figurski 2, Katarzyna Pachulska-Wieczorek 1,*, Marta Szachniuk 1,2,*
Editor: Danny Barash3
PMCID: PMC7529196  PMID: 33002005

Abstract

RNAs adopt specific structures to perform their functions, which are critical to fundamental cellular processes. For decades, these structures have been determined and modeled with strong support from computational methods. Still, the accuracy of the latter ones depends on the availability of experimental data, for example, chemical probing information that can define pseudo-energy constraints for RNA folding algorithms. At the same time, diverse computational tools have been developed to facilitate analysis and visualization of data from RNA structure probing experiments followed by capillary electrophoresis or next-generation sequencing. RNAthor, a new software tool for the fully automated normalization of SHAPE and DMS probing data resolved by capillary electrophoresis, has recently joined this collection. RNAthor automatically identifies unreliable probing data. It normalizes the reactivity information to a uniform scale and uses it in the RNA secondary structure prediction. Our web server also provides tools for fast and easy RNA probing data visualization and statistical analysis that facilitates the comparison of multiple data sets. RNAthor is freely available at http://rnathor.cs.put.poznan.pl/.

Introduction

Structural features are of importance for the biological functions of RNA molecules. Specific RNA structures are recognized by RNA binding proteins, ligands, and other RNAs—these interactions impact almost every aspect of cell life or viral replication. Therefore, there is a great interest in developing novel approaches for proper and rapid RNA structure modeling. The computational methods enable the obtaining of good quality models of short RNAs based on sequence only, but the accuracy of structure prediction decreases with the length of RNA molecules [15]. The inclusion of RNA structure probing data as pseudo-energy constraints into the thermodynamic folding algorithms significantly improves the accuracy of RNA structure prediction [6, 7]. Among chemical and enzymatic methods, SHAPE (selective 2′-hydroxyl acylation analyzed by primer extension) [8] and DMS (dimethyl sulfate) mapping [9] are the best validated and most widely used techniques of RNA structure probing in vitro and in vivo [10, 11]. Besides, the pipelines of SHAPE and DMS probing data incorporation into RNA structure prediction software are well established [6, 12]. DMS modifies the Watson−Crick edge of unpaired adenosines or cytosines, whereas SHAPE reagents create covalent adducts at the 2′-OH group on the RNA sugar ring in a flexibility-sensitive manner [8, 9, 13]. Several SHAPE reagents that differ in their half-life and solubility have been developed until now [1416]. They act independently from nitrogen base and, consequently, one probing reagent can be used instead of a combination of base-specific chemicals.

Effective detection and quantitative measurement of modification sites are critical for all RNA probing experiments. Typically, RNA chemical modification is followed by a reverse transcription to cDNA that is truncated or mutated at the adducts position [8, 17]. The sites of RT stops in the cDNA can be read-out using the capillary electrophoresis (CE) or next-generation sequencing (NGS) but only the second technique can be used for the detection of adduct-induced mutations [17, 18]. The NGS-based techniques allow genome-wide and transcriptome-wide profiling of RNA structure. The CE is widely used for resolving reactivity data from medium- and low-throughput RNA probing experiments. There are many examples of SHAPE-CE usage for analysis of the structure of many important RNAs, including ribosomal RNAs [19, 20], long noncoding RNAs [2123], viral RNAs [2430], and retrotransposon RNAs [3133]. Besides, CE can also be used for the analysis of RNA probing experiments utilizing other chemical reagents such as CMCT, kethoxal, hydroxyl radicals, and RNases [3437].

The extraction of quantitative data from CE electropherograms is challenging and requires complicated, multistep analysis of fluorescence signals. Several computational tools can process electropherograms from SHAPE-CE experiments [3842]. Among them, ShapeFinder [41] and QuShape [42] are the most widely used and yield high-quality SHAPE reactivity data for 300–600 nucleotides in one experiment. Before the incorporation of probing information into the thermodynamic RNA folding algorithms, the reactivity values must be normalized to a uniform scale that is valid for diverse RNAs. Additionally, visual inspection of nonspecific RT strong-stops (non-induced by adduct formation) is required.

Normalization and other quality control steps are very important aspects of structure probing data analysis. Therefore, we developed RNAthor, a user-friendly tool for fast, automatic normalization, and analysis of the CE-based RNA probing data (Fig 1). Features of our tool include (i) normalization of data from several experiments in the box-plot scheme at once, (ii) automatic detection of strong-stops of reverse transcriptase, (iii) reactivity data visualization, (iv) statistical analysis of the results to compare multiple data sets, and (v) RNA secondary structure prediction based on reactivity data.

Fig 1. Workflow in the RNAthor system.

Fig 1

Materials and methods

RNAthor workflow

In the RNAthor workflow, we distinguish five general stages: validation of the input data (ShapeFinder or QuShape file(s) and optionally RNA sequence), exclusion of unreliable data, normalization of probing data, prediction of the secondary structure (optional), and statistical analysis of the normalized data (optional) (Fig 1).

Validation of the input data

Initially, the user-uploaded files, resulting from ShapeFinder or QuShape, are parsed, and the basic validation of their format is executed. If, additionally, a sequence is entered, RNAthor checks whether it is RNA and whether it is at least as long as the sequence in the input file(s). A positive validation results in the next step of the computational process. Otherwise, the user receives an error message and is asked to provide correct data.

Exclusion of unreliable data

Unreliable data usually correspond to premature terminations of primer extension reaction due to reasons other than the formation of the adduct (e.g., preexisting cleavage or modification in RNA). These nucleotide positions are called RT strong-stops. RNAthor offers two ways of detecting such data and excluding them from further processing: a fully automated algorithm and an interactive procedure requiring manual selections. The automated procedure (Fig 2) was implemented based on our experience with analyzing data from RNA chemical probing experiments in vitro and in vivo. It was optimized for the analysis of SHAPE and DMS probing data. It eliminates the data, which meet one of the following criteria: the absolute reactivity value is negative; the background peak area is at least five times larger than the average background peak area; the difference in peak areas between background and reaction is less than 35% of the average background peak area, and the background peak area in this position is equal to or greater than this average. The alternative is a manual procedure, recommended especially for processing the data from RNA probing experiments other than SHAPE or DMS-probing. In this approach, users can identify unreliable data according to their own experience. They define the negative reactivity threshold, and indicate how to treat the negative reactivity values—they can be left as negative values, changed to 0, or marked as no data. RNAthor displays the histogram with peak areas for modification reaction and background for each nucleotide. Based on this view, users manually select RT strong-stops′ positions. All identified RT strong-stops are next excluded from the normalization step.

Fig 2. Scheme of the RNAthor algorithm for unreliable data exclusion.

Fig 2

Data normalization

In this step, the data are brought into proportion with one another, and outliers are removed, to provide users with easy to interpret reactivity data on a uniform scale. RNAthor applies the standard box-plot scheme, recommended to normalize the SHAPE-CE data [43, 44]. The normalization process involves: identifying outliers, determining the effective maximum reactivity, and calculating the normalized reactivity values. The initial task is to determine the first (Q1) and the third (Q3) quartile, the interquartile range (IQR), and compute the upper extreme: UP = Q3 + 1.5(IQR). Reactivities greater than UP are considered outliers and not taken into account in subsequent calculations following the principle: for RNAs longer than 100 nucleotides, no more than 10% of the data are identified as outliers; for shorter RNAs, maximum 5% of data are removed. The remaining values are used to compute the effective maximum reactivity, i.e., the average of the top 8% of reactivity values. Finally, all absolute reactivity values are divided by the effective maximum reactivity. It results in obtaining the normalized reactivity data on a uniform scale. Values close to 0 indicate no reactivity (and highly constrained nucleotides), while values greater than 0.85 correspond to high reactivity (and flexible nucleotides).

Secondary structure prediction

Optionally, users can obtain the secondary structure predicted for the RNA sequence provided at the input. If the sequence is given, RNAthor automatically executes the incorporated RNAstructure algorithm [45] that supports SHAPE / DMS data-driven prediction. It takes the RNA sequence and the normalized probing data and generates the respective secondary structure. The graphical diagram of the structure is colored according to the color scheme defined for the default reactivity ranges. The output structure is also encoded in the dot-bracket notation.

Statistical analysis of the normalized data

Logged users can perform additional statistical analysis of the normalized probing data. The analysis includes 2–5 experiments selected by the user. It consists of running the Shapiro-Wilk test for normal data distribution, Bartlett test of variance homogeneity, non-parametric Mann-Whitney test (if the user selected 2 experiments), and Kruskal-Wallis rank-sum test (if the user selected 3–5 experiments). Two latter tests are performed if the probing data departure from the normal distribution. As a result of the analysis, users receive numerical, textual, and graphical data—among others, the comparative step plot, the box-and-whisker plot, and the violin plot.

Experimental setup

RNA probing data for the RNAthor validation were obtained from SHAPE-CE and DMS-CE experiments performed in our laboratory for Ty1 RNA (+1–560). The results of SHAPE-based manual analysis were already published in [32]. The DMS experiment was performed especially for this work; its details are presented below. Electropherograms from SHAPE and DMS probing were processed using ShapeFinder software according to the authors’ instructions [41].

For RNA probing with DMS, RNA (8 pmol) was refolded in 30 μl of renaturation buffer (10 mM Tris-HCl pH 8.0, 100 mM KCl and 0.1 mM EDTA) by heating for 3 minutes at 95°C, slow cooling to 4°C, then adding 90 μl of water and 30 μl of 5x folding buffer (final concentration: 40 mM Tris-HCl pH 8.0, 130 mM KCl, 0.2 mM EDTA, 5 mM MgCl2), followed by incubation for 20 minutes at 37°C. The RNA sample was divided into two tubes and treated with DMS dissolved in ethanol (+) or ethanol alone (-), and incubated at RT for 1 minute. The reaction was quenched by the addition of 14.7 M β-mercaptoethanol. RNA was recovered by ethanol precipitation and resuspended in 10 μl of water. Primer extension reactions were performed using fluorescently labeled primer [Cy5 (+) and Cy5.5 (-)] as described previously [32]. Sequencing ladders were prepared using primers labeled with WellRed D2 (ddA) and LicorIRD-800 (ddT) and a Thermo Sequenase Cycle Sequencing kit (Applied Biosystems) according to the manufacturer′s protocol. Samples were analyzed on a GenomeLab GeXP Analysis System (Beckman-Coulter).

Web application

RNAthor, implemented as a publicly available web server, has a simple and intuitive interface. It runs on all major web browsers and is accessible at http://rnathor.cs.put.poznan.pl/. The web service is hosted and maintained by the Institute of Computing Science, Poznan University of Technology, Poland.

Implementation details

The architecture of RNAthor comprises two components: the computational engine (backend layer) and the web application (frontend layer). The backend layer, implemented in Java OpenJDK 8.0, applies selected modules of the Spring Framework: Spring Boot 2.1.6 enables fast configuration of the application; Spring Security ensures user authentication and basic security; Spring MVC allows compatibility with the Model View Controller and Apache Tomcat server; Spring Test, via Junit and Mockito libraries, enables unit tests and integration; Spring Data allows for comprehensive database services, including transaction management. The user interface (frontend layer) is implemented in Angular technology. User data and basic information about the experiments are collected in the PostgreSQL relational database, the input and output data are saved on the server's hard drive. The tool uses the Apache License 2.0.

Input and output description

At the input, RNAthor accepts ShapeFinder or QuShape output files in a tab-delimited text format. Users upload their data via the New experiment page by selecting 1–15 files from the local folder. All files in the multiple-input should come from several repetitions of the RNA probing experiment performed for the same RNA. Repetitions increase the reliability of structural data for RNA secondary structure prediction. RNAthor processes all the input files in a single run. It starts after data uploading and setting additional parameters for the normalization process (algorithm for RT strong-stops detection, probing reagent, color settings). Additionally, users can provide an RNA sequence that is used to predict the RNA secondary structure.

RNAthor generates a selection of output data. First of all, users obtain the output file in the SHAPE format (*.shape) that is compatible with the RNAstructure software [45]. The file comprises two columns with nucleotide positions and normalized SHAPE / DMS-CE reactivity data. For the multiple-input, the generated SHAPE file contains averaged reactivities from all normalized data. Nucleotides for which there is no reactivity data are assigned -999 values as recommended in [43]. If the user uploaded the sequence of the analyzed RNA molecule, RNAthor provides the RNA secondary structure in dot-bracket notation and the graphical diagram. Additionally, RNAthor generates files that can facilitate the analysis of RNA probing experiments. One of them is the MS Excel file with spreadsheets containing the input data, the normalized reactivity values, and averaged normalized reactivity data with standard deviation (the average and standard deviation are calculated separately for each nucleotide across samples). Each spreadsheet with the input data contains a histogram, identical to this created during manual removal of RT strong-stops. Rows with normalized reactivity values are colored depending on the user′s settings. In the processing of large RNAs, this file can help to combine probing data from overlapping reads (with a different set of primers). RNAthor also prepares a graphical output: step plot and bar plot presenting a reactivity profile for one experiment or averaged data from several repetitions. The bar plot is colored depending on the settings: black for reactivities in [0, 0.4), orange for reactivities in [0.4, 0.85), and red for reactivities in [0.85, ∞) by default. Logged users that run statistical analysis of experimental data also obtain comparative step plot, box-and-whisker plot, violin plot, and summary of test results. The latter one, available for download in .txt file, informs whether the uploaded data come from a normal distribution, whether they have equal variances, what statistical test was performed, and what is the p-value. The comparative step plot shows reactivity profiles of all compared experiments in one chart. The box-and-whisker plot displays the distribution of data based on the position measures, such as quartiles, minimum, and maximum. The violin plot presents the shape of the distribution and probability density of normalized reactivity values. All generated plots can be saved in PNG or EPS format. Users download the output files separately or in a single zipped archive. They can also obtain them as an email attachment—if the email was provided at the input. Additionally, the email contains a unique link to the result page. The results are stored in the system for 3 days (for guest users) or 3 months (for logged users). Logged users can extend the storage time by an additional month.

Results

RNAthor allows for efficient, automated processing and analysis of RNA probing data from SHAPE-CE and DMS-CE experiments and their use in data-driven RNA secondary structure prediction. It was tested on multiple datasets, containing data from SHAPE and DMS probing experiments resolved by capillary electrophoresis. The tests confirmed the reliability of the results and showed the utility of the tool. Here, we describe the experiments performed to compare the results of RNA probing data analysis carried out manually by an expert, and automatically by RNAthor. For the experiments, we chose SHAPE-CE and DMS-CE probing data obtained for RNA of yeast retrotransposon Ty1. The structure of the 5′-end of Ty1 RNA was extensively studied and determined under different experimental conditions and biological states [31, 32, 46].

In the first test, we executed RNAthor for the ShapeFinder-generated files containing the probing data obtained from three independent replicates of SHAPE experiment (raw data used in this experiment are provided in the S1 File). We ran RNAthor with the default settings and the automated algorithm for the identification of RT strong-stops. The generated normalized reactivity data were next compared to the corresponding data published in [32], resulting from manual analysis of the same input. We aligned the obtained bar plots (Fig 3A), and we computed the correlation between normalized reactivity values (Fig 3B). In the second test, we repeated the same procedure for data obtained from the DMS experiment (unpublished data; the experiment was carried out for this work especially). “Blind” human experimentalist analyzed the DMS data preprocessed using ShapeFinder, normalized reactivity values, manually identified unreliable data and applied OriginPro to generate the bar plot presenting the reactivity profile. The results of this manual processing were compared to the output generated by RNAthor that was executed with DMS reagent selected and automated identification of RT strong-stops (Fig 3C and 3D).

Fig 3. Automatic and manual normalization of RNA probing data.

Fig 3

SHAPE (A) and DMS (C) reactivity profiles calculated by RNAthor (red) and manually (black). Correlation between RNAthor and manual analysis per nucleotide reactivity estimated for SHAPE (B) and DMS (D).

From these experiments, we observe that all RT strong-stops identified manually by the expert are also selected for exclusion by the automatic algorithm implemented in RNAthor. On the other hand, few data assigned as RT strong-stops by RNAthor can be considered reliable in the human-dependent analysis. This is due to the rigid criteria for determining RT strong-stops adopted in the algorithm. Table 1 presents the results of the detailed analysis we did by comparing manual, expert-driven, and automatic, RNAthor-performed detection of unreliable data. We computed basic measures used to evaluate the quality of binary classification: true positives (TP)–data classified as reliable by both expert and RNAthor, true negatives (TN)–data classified as unreliable by both expert and RNAthor, false positives (FP)–data indicated as unreliable by the expert but classified as reliable by RNAthor, false negatives (FN)–data indicated as reliable by the expert but classified as unreliable by RNAthor. Using these measures, we calculated the accuracy (ACC), sensitivity (TPR, true positive rate), specificity (TNR, true negative rate), and precision (PPV, positive predictive value) of the automatic algorithm implemented in RNAthor. All these measures were determined for three datasets: SHAPE probing data separately, DMS probing data separately, and data from both sets together. They prove the high quality of the tested algorithm for all datasets. Accuracy and sensitivity equal 0.99, where accuracy, ACC = (TP+TN)/(TP+TN+FP+FN), represents the ratio of correct classifications to the total number of input data, and sensitivity, TPR = TP/(TP+FN), indicates what part of the actual reliable data has been correctly classified by RNAthor. Specificity and precision are both equal to 1, which is because of FP = 0. Specificity, TNR = TN/(TN+FP) is a fraction of correctly classified unreliable data, while the precision, PPV = TP/(TP+FP), informs about the fraction of unreliable data classified as reliable. Finally, the experiments show that—despite some differences between expert- and RNAthor-driven analysis—the normalized RNA probing reactivity values obtained in both approaches are highly similar. The comparison of the reactivity profiles indicates the conformity of manual and automatic procedures. The averaged results from three independent probing experiments yield a Spearman correlation coefficient equal to 0.9987 for SHAPE and 0.995 for DMS-based analysis (Fig 3).

Table 1. The results of validation of RNAthor algorithm for unreliable data identification.

dataset TP TN FP FN ACC PPV TPR TNR
SHAPE 2462 38 0 32 0.99 1 0.99 1
DMS 2431 35 0 25 0.99 1 0.99 1
ALL 4893 73 0 57 0.99 1 0.99 1

In the testing phase, we also executed statistical analysis to verify the repeatability of obtained results for each nucleotide between the replicates, and compare the reactivity profiles. Fig 4 shows an example of such verification for selected DMS-CE experiments previously performed by RNAthor (raw data used in these experiments are provided in the S1 File). Experiments 1 and 2 (denoted as DMSexp1 and DMSexp2 in Fig 4) were performed under identical experimental conditions, while the higher concentration of DMS was used in experiment 3 (denoted as DMSexp3). We observed a high similarity between reactivity profiles generated for experiments 1 and 2 (Fig 4A), whereas a significant difference was visible for experiment 3 (Fig 4B). As expected, the box plot and violin plot present the comparable DMS data distributions for experiments 1 and 2 (Fig 4C). The statistical plots for experiment 3 clearly show a significant increase in the number of more reactive nucleotides, and a concurrent decrease of unreactive nucleotides, consequently, the overall median reactivity is higher (Fig 4C). From these examples, we can see that additional options of RNAthor can be used for fast and easy comparative and statistical analysis for RNA chemical probing experiments.

Fig 4. Example verification of the repeatability of RNA chemical probing experiments.

Fig 4

Comparative step plot for repeatable (A) and non-repeatable (B) replicates of Ty1 RNA probing with DMS. (C) The box-and-whisker plot and violin plot presenting the differences in reactivity data distribution obtained for repeatable and non-repeatable experiments.

Conclusions

In this work, we presented RNAthor, the new computational tool dedicated to the study of RNA structures that enriched the set of web-interfaced bioinformatics systems available within the RNApolis project [47]. RNAthor was designed for a fully automatic, quick normalization, and analysis of SHAPE / DMS-CE data. Although several programs can process the results of CE-based RNA probing, so far, no automatic procedure could identify unreliable data, and this step of the analysis was usually done manually. RNAthor incorporates the algorithm for the automatic exclusion of RT strong-stops to minimize user involvement in the probing data analysis. The tool can be applied to analyze data from other RNA probing methods if capillary electrophoresis and ShapeFinder or QuShape were used for data collection. RNAthor also visualizes the results of RNA probing data normalization, runs the data-driven prediction of RNA secondary structure, and performs the statistical tests. The latter option facilitates the comparative study of multiple probing experiments, allows to assess the compatibility between experiments, and compare whole data sets of RNAs probed in different experimental conditions (e.g., in vitro, in vivo, ex vivo, in virio, ex virio), or in the absence or presence of protein/ligand. Compared to manual or semi-automated data processing, RNAthor significantly reduces the time needed for data analysis; thus, it can highly improve the study and interpretation of data obtained from RNA chemical probing experiments.

In the future, we plan to extend the functionality of RNAthor by implementing procedures combining RNA probing data from overlapping CE reads to facilitate the structural analysis of large RNAs.

Supporting information

S1 File

(PDF)

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

This work was funded by the National Science Centre Poland (https://ncn.gov.pl/?language=en) in the form of grants awarded to MS (2016/23/B/ST6/03931, 2019/35/B/ST6/03074) and KPW (2016/22/E/NZ3/00426). Funding for the open access charge was provided by the statutory funds of Poznan University of Technology. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Waterman MS, Smith TF (1978) Rna Secondary Structure—Complete Mathematical-Analysis. Mathematical Biosciences 42: 257–266. [Google Scholar]
  • 2.Doshi KJ, Cannone JJ, Cobaugh CW, Gutell RR (2004) Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinformatics 5: 105 10.1186/1471-2105-5-105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Popenda M, Szachniuk M, Antczak M, Purzycka KJ, Lukasiak P, et al. (2012) Automated 3D structure composition for large RNAs. Nucleic Acids Res 40: e112 10.1093/nar/gks339 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Biesiada M, Pachulska-Wieczorek K, Adamiak RW, Purzycka KJ (2016) RNAComposer and RNA 3D structure prediction for nanotechnology. Methods 103: 120–127. 10.1016/j.ymeth.2016.03.010 [DOI] [PubMed] [Google Scholar]
  • 5.Lukasiak P, Antczak M, Ratajczak T, Szachniuk M, Popenda M, et al. (2015) RNAssess—a web server for quality assessment of RNA 3D structures. Nucleic Acids Res 43: W502–506. 10.1093/nar/gkv557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, et al. (2004) Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci U S A 101: 7287–7292. 10.1073/pnas.0401799101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ge P, Zhang S (2015) Computational analysis of RNA structures with chemical probing data. Methods 79–80: 60–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Merino EJ, Wilkinson KA, Coughlan JL, Weeks KM (2005) RNA structure analysis at single nucleotide resolution by selective 2'-hydroxyl acylation and primer extension (SHAPE). J Am Chem Soc 127: 4223–4231. 10.1021/ja043822v [DOI] [PubMed] [Google Scholar]
  • 9.Inoue T, Cech TR (1985) Secondary structure of the circular form of the Tetrahymena rRNA intervening sequence: a technique for RNA structure analysis using chemical probes and reverse transcriptase. Proc Natl Acad Sci U S A 82: 648–652. 10.1073/pnas.82.3.648 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Spitale RC, Flynn RA, Torre EA, Kool ET, Chang HY (2014) RNA structural analysis by evolving SHAPE chemistry. Wiley Interdiscip Rev RNA 5: 867–881. 10.1002/wrna.1253 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Mitchell D 3rd, Assmann SM, Bevilacqua PC (2019) Probing RNA structure in vivo. Curr Opin Struct Biol 59: 151–158. 10.1016/j.sbi.2019.07.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sloma MF, Mathews DH (2015) Improving RNA secondary structure prediction with structure mapping data. Methods Enzymol 553: 91–114. 10.1016/bs.mie.2014.10.053 [DOI] [PubMed] [Google Scholar]
  • 13.McGinnis JL, Dunkle JA, Cate JH, Weeks KM (2012) The mechanisms of RNA SHAPE chemistry. J Am Chem Soc 134: 6617–6624. 10.1021/ja2104075 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mortimer SA, Weeks KM (2007) A fast-acting reagent for accurate analysis of RNA secondary and tertiary structure by SHAPE chemistry. J Am Chem Soc 129: 4144–4145. 10.1021/ja0704028 [DOI] [PubMed] [Google Scholar]
  • 15.Spitale RC, Crisalli P, Flynn RA, Torre EA, Kool ET, et al. (2013) RNA SHAPE analysis in living cells. Nat Chem Biol 9: 18–20. 10.1038/nchembio.1131 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Busan S, Weidmann CA, Sengupta A, Weeks KM (2019) Guidelines for SHAPE Reagent Choice and Detection Strategy for RNA Structure Probing Studies. Biochemistry 58: 2655–2664. 10.1021/acs.biochem.8b01218 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Siegfried NA, Busan S, Rice GM, Nelson JA, Weeks KM (2014) RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat Methods 11: 959–965. 10.1038/nmeth.3029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Mailler E, Paillart JC, Marquet R, Smyth RP, Vivet-Boudou V (2019) The evolution of RNA structural probing methods: From gels to next-generation sequencing. Wiley Interdiscip Rev RNA 10: e1518 10.1002/wrna.1518 [DOI] [PubMed] [Google Scholar]
  • 19.Deigan KE, Li TW, Mathews DH, Weeks KM (2009) Accurate SHAPE-directed RNA structure determination. Proc Natl Acad Sci U S A 106: 97–102. 10.1073/pnas.0806929106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Abeysirigunawardena SC, Kim H, Lai J, Ragunathan K, Rappe MC, et al. (2017) Evolution of protein-coupled RNA dynamics during hierarchical assembly of ribosomal complexes. Nat Commun 8: 492 10.1038/s41467-017-00536-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Garcia GR, Goodale BC, Wiley MW, La Du JK, Hendrix DA, et al. (2017) In Vivo Characterization of an AHR-Dependent Long Noncoding RNA Required for Proper Sox9b Expression. Mol Pharmacol 91: 609–619. 10.1124/mol.117.108233 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Liu F, Somarowthu S, Pyle AM (2017) Visualizing the secondary and tertiary architectural domains of lncRNA RepA. Nat Chem Biol 13: 282–289. 10.1038/nchembio.2272 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Owens MC, Clark SC, Yankey A, Somarowthu S (2019) Identifying Structural Domains and Conserved Regions in the Long Non-Coding RNA lncTCF7. Int J Mol Sci 20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wilkinson KA, Gorelick RJ, Vasa SM, Guex N, Rein A, et al. (2008) High-throughput SHAPE analysis reveals structures in HIV-1 genomic RNA strongly conserved across distinct biological states. PLoS Biol 6: e96 10.1371/journal.pbio.0060096 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Soszynska-Jozwiak M, Michalak P, Moss WN, Kierzek R, Kesy J, et al. (2017) Influenza virus segment 5 (+)RNA—secondary structure and new targets for antiviral strategies. Sci Rep 7: 15041 10.1038/s41598-017-15317-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Michalak P, Soszynska-Jozwiak M, Biala E, Moss WN, Kesy J, et al. (2019) Secondary structure of the segment 5 genomic RNA of influenza A virus and its application for designing antisense oligonucleotides. Sci Rep 9: 3801 10.1038/s41598-019-40443-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Mahmud B, Horn CM, Tapprich WE (2019) Structure of the 5' Untranslated Region of Enteroviral Genomic RNA. J Virol 93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Newburn LR, White KA (2020) A trans-activator-like structure in RCNMV RNA1 evokes the origin of the trans-activator in RNA2. PLoS Pathog 16: e1008271 10.1371/journal.ppat.1008271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Khoury G, Mackenzie C, Ayadi L, Lewin SR, Branlant C, et al. (2020) Tat IRES modulator of tat mRNA (TIM-TAM): a conserved RNA structure that controls Tat expression and acts as a switch for HIV productive and latent infection. Nucleic Acids Res 48: 2643–2660. 10.1093/nar/gkz1181 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kendall C, Khalid H, Muller M, Banda DH, Kohl A, et al. (2019) Structural and phenotypic analysis of Chikungunya virus RNA replication elements. Nucleic Acids Res 47: 9296–9312. 10.1093/nar/gkz640 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Purzycka KJ, Legiewicz M, Matsuda E, Eizentstat LD, Lusvarghi S, et al. (2013) Exploring Ty1 retrotransposon RNA structure within virus-like particles. Nucleic Acids Res 41: 463–473. 10.1093/nar/gks983 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gumna J, Purzycka KJ, Ahn HW, Garfinkel DJ, Pachulska-Wieczorek K (2019) Retroviral-like determinants and functions required for dimerization of Ty1 retrotransposon RNA. RNA Biol 16: 1749–1763. 10.1080/15476286.2019.1657370 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Blaszczyk L, Biesiada M, Saha A, Garfinkel DJ, Purzycka KJ (2017) Structure of Ty1 Internally Initiated RNA Influences Restriction Factor Expression. Viruses 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.McGinnis JL, Duncan CD, Weeks KM (2009) High-throughput SHAPE and hydroxyl radical analysis of RNA structure and ribonucleoprotein assembly. Methods Enzymol 468: 67–89. 10.1016/S0076-6879(09)68004-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Novikova IV, Hennelly SP, Sanbonmatsu KY (2012) Structural architecture of the human long non-coding RNA, steroid receptor RNA activator. Nucleic Acids Res 40: 5034–5051. 10.1093/nar/gks071 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Nishida Y, Pachulska-Wieczorek K, Blaszczyk L, Saha A, Gumna J, et al. (2015) Ty1 retrovirus-like element Gag contains overlapping restriction factor and nucleic acid chaperone functions. Nucleic Acids Res 43: 7414–7431. 10.1093/nar/gkv695 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Pachulska-Wieczorek K, Blaszczyk L, Biesiada M, Adamiak RW, Purzycka KJ (2016) The matrix domain contributes to the nucleic acid chaperone activity of HIV-2 Gag. Retrovirology 13: 18 10.1186/s12977-016-0245-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Pang PS, Elazar M, Pham EA, Glenn JS (2011) Simplified RNA secondary structure mapping by automation of SHAPE data analysis. Nucleic Acids Res 39: e151 10.1093/nar/gkr773 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Yoon S, Kim J, Hum J, Kim H, Park S, et al. (2011) HiTRACE: high-throughput robust analysis for capillary electrophoresis. Bioinformatics 27: 1798–1805. 10.1093/bioinformatics/btr277 [DOI] [PubMed] [Google Scholar]
  • 40.Cantara WA, Hatterschide J, Wu W, Musier-Forsyth K (2017) RiboCAT: a new capillary electrophoresis data analysis tool for nucleic acid probing. RNA 23: 240–249. 10.1261/rna.058404.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Vasa SM, Guex N, Wilkinson KA, Weeks KM, Giddings MC (2008) ShapeFinder: a software system for high-throughput quantitative analysis of nucleic acid reactivity information resolved by capillary electrophoresis. RNA 14: 1979–1990. 10.1261/rna.1166808 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Karabiber F, McGinnis JL, Favorov OV, Weeks KM (2013) QuShape: Rapid, accurate, and best-practices quantification of nucleic acid probing information, resolved by capillary electrophoresis. RNA 19: 63–73. 10.1261/rna.036327.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Low JT, Weeks KM (2010) SHAPE-directed RNA secondary structure prediction. Methods 52: 150–158. 10.1016/j.ymeth.2010.06.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Lusvarghi S, Sztuba-Solinska J, Purzycka KJ, Rausch JW, Le Grice SF (2013) RNA secondary structure prediction using high-throughput SHAPE. J Vis Exp: e50243 10.3791/50243 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Reuter JS, Mathews DH (2010) RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11: 129 10.1186/1471-2105-11-129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Huang Q, Purzycka KJ, Lusvarghi S, Li D, Legrice SF, et al. (2013) Retrotransposon Ty1 RNA contains a 5'-terminal long-range pseudoknot required for efficient reverse transcription. RNA 19: 320–332. 10.1261/rna.035535.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Szachniuk M (2019) RNApolis: Computational Platform for RNA Structure Analysis. Foundations of Computing and Decision Sciences 44: 241–257. [Google Scholar]

Decision Letter 0

Danny Barash

9 Jun 2020

PONE-D-20-09557

RNAthor – fast, accurate normalization, visualization and statistical analysis of RNA probing data resolved by capillary electrophoresis

PLOS ONE

Dear Dr. Szachniuk,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The revised manuscript should address all the critical points raised by all reviewers.

Please submit your revised manuscript by Jul 24 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Danny Barash

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Experimental data from RNA structure probing assays in the form of reactivities to structure-sensitive reagents can be integrated with RNA structure-prediction algorithms to improve prediction accuracy. To this end, the raw data is first processed through assay-specific pipelines to get reactivities of nucleotides of RNAs. The reactivities are normalized, and become the input to a structure-prediction algorithm along with the sequences of RNAs of interest. SHAPE-based probing followed by capillary electrophoresis is the traditional way to perform such experiments, and ShapeFinder is a popular tool to estimate reactivities from the electrophoresis data. RNAthor by Gumna et al. primarily serves the purpose of normalizing the reactivities from ShapeFinder and saving files that can be input to RNAstructure. It performs additional visualizations and some statistical tests.

Normalization and exploratory visualization of data are important steps of data analysis. Experimental biologists often struggle with this. Hence, a good interactive web application for this would be a great development. Normalization of such data must account for poor quality information for some nucleotides. Gumna et al. have implemented their empirically developed approach to identify and exclude such nucleotides from analysis. This is the primary technical contribution in this manuscript. However, the approach has not been validated or compared with existing approaches. Tests against some benchmark data and performance assessment are required. Besides, the authors have not described the logic behind their exclusion criteria but simply stated them as mathematical rules. Of the three criteria, the one I could interpret --- filtering negative reactivities --- is common practice in the field. The other two criteria have cutoff values for comparison of peak areas, but it has not been ascertained that they are optimal. They have simply been stated as prior empirical knowledge. Hence, the novelty is low.

Additionally, the authors claim that RNAthor "significantly reduces the time required for data analysis". However, this is not substantiated by any of the results. I'd rather say that an interactive application such as RNAthor could save the time spent writing scripts for data analysis. The current wording would be appropriate if the manuscript presented algorithmic advances that reduce time complexity of the analysis.

Further, the authors have motivated the manuscript in several places by claiming that SHAPE followed by capillary electrophoresis and analyzed by ShapeFinder is a widely recommended approach to study RNA structures. However, the references to support these assertions are often old papers from one particular lab. Hence, the application seems to be of limited interest.

Following are some other comments:

1. It would be nice to have a standalone version of the application.

2. What happens to data uploaded on the server if the user doesn't make an account? Is it also stored on your server for months? Please comment if there any data security issues.

3. Page 10, line 207 says "averaged normalized SHAPE data with standard deviation". Is this referring to average across nucleotides or samples?

4. Page 11, lines 232-239 describe additional statistics computed by RNAthor. However, the purpose behind these specific statistical tests and what the users could do with them is missing.

Reviewer #2: The manuscript by Gumna et al. presents a web-based platform for quality control of RNA structure probing data obtained by experiments that combine SHAPE chemistry with capillary electrophoresis (CE) quantification of the SHAPE reaction’s cDNA products. The platform takes a SHAPE reactivity profile as input and performs automated data normalization, needed to bridge between different experimental conditions and different RNAs, and automated detection of unreliable data points. Users can also visualize the data and run a statistical test that aims to assess reproducibility and possibly also structural variation.

This work tackles a very important aspect of structure probing data analysis, in particular one that has not received sufficient attention to date. Normalization and other quality control steps remain a relatively unexplored area, and researchers often resort to one of a few popular strategies, which many find to be over-simplistic, too narrowly focused, and generally unsatisfactory. As such, this work has the potential to have real impact. However, more work is needed on the authors’ side to bring this work to its full potential, in particular, more testing and a solid convincing demonstration of the utility and validity of the proposed approach.

Several things are missing. First, there are many methods and tools for SHAPE data analysis, currently disregarded by this work. Only 2 software platforms, ShapeFinder and QuShape, are mentioned here, but so many other tools were developed over the last 5-6 years. It is true that newer tools were designed with Seq/MaP/MaPseq protocols in mind; however, this is irrelevant because the proposed platform accepts reactivities as input. It doesn’t matter how reactivates were obtained, as reactivities processed from next-gen or from CE platforms are still reactivities. It has also been shown multiple times (mainly by the Weeks lab) that next-gen-based reactivities have very similar statistical properties to the one obtained by CE platforms. Accounting for other data processing platforms is important because they offer similar normalization routines, and in fact, most of them feature additional popular normalization routines. This, in turn, impacts the novelty of the proposed platform. What I find to be unique to this work is the authors’ approach to automating quality control, particularly the removal of potentially unreliable data points. I don’t think other platforms offer something similar, but this leads me to my second point, which is that the manuscript lacks any demonstration of the performance of the approach (or any other feature unique to this work, such as reproducibility assessment). I understand the authors have gained substantial experience analyzing structure probing data, but the fact that they believe their method “works well” is insufficient for publishing it. The only way to get users to try this out is by showing them, visually and also quantitatively, that the outputs are indeed robust and reliable. I have referred to this issue in more details in my comments below.

Finally, I wonder why the authors limit consideration to both SHAPE data and CE-based platforms. I think the scope of their work could be extended relatively easily. As I mentioned above, any reactivity profile needs to be normalized and quality-controlled. Furthermore, the bigger a dataset is, the more critical automation is, so why not consider the plethora of datasets obtained by high-throughput sequencing-based experiments? I also don’t see a reason to limit this work to SHAPE data. There are many popular alternatives today, including DMS, HRF, and several SHAPE variants, such as NAI. I understand the authors may have tailored their automated QC routine to the special properties of SHAPE, but it is worth testing how well it does on DMS data. Additionally, as shown by several labs, SHAPE-CE, SHAPE-MaP, and SHAPE-Seq all generate very similar data, so why not extend the scope at least in the context of the SHAPE probe?

To summarize, this work needs to be revised to account for a large body of work from labs other than the Weeks lab and for recent advances in structure probing experimental and data analysis capabilities. However, at the same time, it could also leverage the recent expansion in the scope of structure probing to provide a tool of much broader applicability than its current designation, especially because it targets a step in data analysis which has not been adequately addressed to date. Detailed comments and suggestions are below.

There is some novelty in the automation of selection of reactivities for exclusion from analysis based on background signal. To the best of my knowledge, this type of quality control is normally done manually. However, the manuscript and the proposed tool target analysis of relatively short RNAs (no longer than ~300-500 nt) due to SHAPE/DMS and CE limitations. In such cases, manual/visual inspection of the signal and the background traces is not so time-consuming.

The authors also claim that the proposed automation of selection “works well for SHAPE experiments performed in vitro or in vivo”(page 8). There are two key issues with this statement. First, it is not supported by evidence. Second, it is not clear how the authors determine that their method “works well”. I understand that this statement is based on vast experience with SHAPE data analysis, but this is not convincing from a reader’s perspective, and the readers are your potential users. I would like to see numerous real SHAPE data traces (in vitro and in vivo) from more than one lab, for which the automated method works as well as manual correction. This, in turn, requires a suitable quantitative assessment metric. Since currently there is no consensus metric for evaluating “goodness” of SHAPE data, the authors could come up with their own metric, as long as it is appropriate and convincing. Some investigators use SHAPE-directed structure prediction accuracy to benchmark the performance of data processing pipelines. However, I don’t find that convincing (unless differences are truly dramatic) because the NNTM model introduces so much additional complexity and uncertainty to the output. Another way is to show that agreement between replicates improves after the automated routine is applied to the data. Microarray informatics method developers commonly use this approach to demonstrate that a proposed pre-processing step is effective. Existing measures of replicate agreement that are specialized to structure probing data are found in Choudhary et al., Bioinformatics, 2016 and Choudhary et al., Genome Biology, 2019. Alternatively, the authors could propose a novel quantitative measure that captures those data characteristics that make them think it “works well”.

Data normalization: The authors implemented the popular box-plot method, but they caution the reader to avoid using it for RNAs shorter than 300 nt (page 9). So what should a user do if he/she is studying a relatively short RNA? This lack of options severely limits the utility of the proposed platform. Please also see my other comment below regarding normalization strategies that other data analysis platforms offer. How about providing an alternative strategy for relatively short RNAs?

Some statements need to be toned down and/or revised. For example, “Currently, the most common method of RNA chemical probing is SHAPE used in conjunction with capillary electrophoresis (SHAPE-CE)” (Abstract). While it is true that SHAPE used to be the most popular method for a decade or so, DMS appears to be as popular as SHAPE nowadays. I think it is more appropriate to say that one popular reagent choice is SHAPE. Note that there is a similar statement in the Introduction section, which also needs to be reworded. Additionally, I think that over the last 2-3 years, SHAPE/DMS in conjunction with next-gen sequencing (via Seq or MaPseq protocols, and very recently, also via direct RNA nanopore sequencing) has become a much more popular choice than traditional CE-based structure probing. A quick literature search would reveal both the widespread use of DMS for modification and the widespread reliance on NGS for cDNA sequencing. Another statement I found to be somewhat outdated is “ShapeFinder is a popular computational tool for the extraction of quantitative SHAPE reactivity values from raw CE electropherograms.” To the best of my knowledge, many labs use QuShape (a newer and more automated SHAPE-CE analysis software from the Weeks lab) and others use in-house scripts. While ShapeFinder used to be the platform of choice for SHAPE-CE analysis, a quick literature search will show this is no longer the case, especially over the last 2-3 years. Note, however, that I acknowledge there are major issues with QuShape’s performance, as I know that many labs are unhappy with it and seek better alternatives. Finally, I also find the statement “… manual normalization of these values to a uniform scale and exclusion of unreliable data are both required before their usage by RNA structure prediction software” to be somewhat misleading because QuShape does feature reactivity normalization and might also allow users to exclude unreliable data (not sure about the latter, though). The way the abstract is worded, one might think that users currently have no software tools for normalizing the data and possibly also excluding unreliable measurements, and I don’t think that is indeed the case.

In continuation to my previous point, there are multiple software platforms, other than QuShape, which allow users to normalize structure probing data. In contrast to ShapeFinder and QuShape, these platforms were designed for NGS-based probing data, hence the initial input to these tools must take the form of reads, FASTA files, or read counts. However, once reactivities have been calculated, these softwares could be used to apply several different normalizations to them. In other words, reactivity normalization is independent of how reactivities are obtained. They may be obtained from CE or NGS data, using a variety of platforms, but once you have them, they can be normalized by numerous existing platforms. For example, SEQualyzer (Choudhary et al., Bioinformatics, 2016) features 2 normalization strategies: 2-8% and box-plot, RNA Framework (Incarnato et al., NAR, 2018) features 3 normalization strategies: 2-8%, box-plot, and 90% Winsorization, and StructureFold (Tang et al., Bioinformatics, 2015) features (via the Galaxy platform) 2-8% normalization and an option to cap reactivities. Some of these platforms also provide data visualization, for a single experiment and sometimes also for replicates. Some platforms are also open-source (e.g., SEQualyzer and RNA Framework) and this, in turn, allows users to directly use their normalization modules. All this prior work is missing from this manuscript, which might create an impression that users currently have no automated way of normalizing SHAPE/DMS reactivities, which is incorrect. Moreover, the most popular normalization strategies are likely 2-8% and box-plot and are described in detail in Sloma and Mathews, Methods Enzymology, 2015. In fact, one could easily implement them in a Matlab/R/Python script and I disagree with the authors that they “require significant user training and are … time consuming and prone to errors”.

Page 4, second sentence: the description of SHAPE is limited to traditional truncation-based SHAPE chemistry. However, since 2014, modifications are alternatively detected via the MaP approach, where the RT introduces mutations at modified sites. This strategy can only be used in conjunction with DNA sequencing, which is likely why it is not mentioned here. However, for the text to be to be more accurate and up-to-date, I think it should be mentioned.

Results, subsection “A brief overview of SHAPE-CE raw data analysis using ShapeFinder software”: I think this should be omitted or at the least moved to the Supp. Material because knowing how ShapeFinder works is not necessary for understanding the authors’ work. This is because both normalization and detection of unreliable data points (i.e., data QC) occur after the output from ShapeFinder has been obtained. Furthermore, ShapeFinder does not trigger the issues that normalization and data QC address. These issues are inherent to chemical modification experiments.

Results, subsection “RNAthor web application”: I find this material to be suitable for a user manual, especially the description of the various buttons, Reference page, Contact page, and Terms and Conditions page. These are not necessary for understanding the work or getting an idea of how user-friendly it is. It should be possible to compress this subsection into the most essential 4-5 sentences that refer to the ease of use and web interface. I also strongly encourage the authors to compile a short user manual and make it available on the platform’s webpage.

Results, subsection “Input Data”: Similar comments as above. The description is too detailed and should be substantially shortened (e.g., no need to mention which button to press).

Results, subsection “Output Data”: Similar comments as above.

Results, subsection “Output Data”: Note that most or all existing structure probing data analysis pipelines feature data visualization, not only ShapeFinder and QuShape. Some pipelines also allow users to visualize several reactivity profiles together.

Results, subsection “Additional Analysis of Normalized SHAPE-CE Data”: I agree with the authors that SHAPE/DMS data departure from a normal distribution. In fact, it has been rigorously established multiple times that this departure is quite significant (see, for example, Sukosd et al., NAR, 2014, Eddy, Ann. Reviews Biophysics, 2014, Deng et al., RNA, 2016). I therefore do not see why a test for normality is implemented. If the authors have SHAPE data that is nearly Gaussian, I would like to see it appended as Supp. Material.

Results, subsection “Additional Analysis of Normalized SHAPE-CE Data”: To facilitate differential reactivity analysis between two samples and/or to assess reproducibility across experiments, the authors implemented a Mann-Whitney U test, aka Wilcoxon rank-sum test. Note that a very similar test, namely, a Wilcoxon signed-rank test, was recently used for differential reactivity analysis in Choudhary et al., Genome Biology, 2019, albeit applied to similarity scores, not reactivities. The Wilcoxon rank-sum test makes an independence assumption, whereas the signed-rank version of the test relaxes this assumption. Can the authors justify their assumption that the compared samples are independent, especially if these are replicates of the same experiment? Additionally, note that the proposed test does not account for biological variation between samples, since one needs >= 2 replicates from each condition/experiment to assess biological variation per condition. What this means is that if one performs the experiment on two biological replicates (especially relevant to in vivo samples), the test might indicate that results are not reproducible because it picks up the biological variation in the RNA’s structure and its reactivity to the SHAPE reagent. I think it is important to alert users to the limitation of the proposed test and to emphasize what it is capable of detecting. This limitation is inherent to any differential analysis test that compares between reactivity profiles without pre-assessing biological variability between samples in a condition.

As with the automated QC routine, I am missing a demonstration of the utility and the validity of the proposed statistical test. This could be done via benchmarking against existing methods (see, for example, Choudhary et al., Genome Biology, 2019 for a benchmark of recent methods). Potential users need to see some concrete evidence that the proposed test has statistical power. Alternatively, at the least, the authors should show examples of reproducible and irreproducible traces and their corresponding test scores as well as examples of how the test performs on biological replicates vs technical replicates. A comparison with existing reproducibility (QC) measures could also be helpful.

It is clear that the authors have gained tremendous experience analyzing CE-based structure probing data, and that the proposed platform is the result of years of empirical hands-on experience. This is invaluable, especially since there aren’t many labs with that level of expertise. For this reason, this work has the potential to lead to real advances in data analysis. In particular, the authors say “RNAthor was extensively tested on ShapeFinder output files (IPF) from published and unpublished SHAPE-CE experiments performed in our laboratory. It was also tested on IPF files from hydroxyl radicals and DMS probing experiments resolved by capillary electrophoresis.” I think what’s really missing in this manuscript, other than an up-to-date review of existing work, is a demonstration of improved performance on many real datasets from multiple probes, multiple conditions, multiple labs, and multiple RNAs. Such demonstration should also include comparisons to existing alternatives, and there are several ones other than ShapeFinder and QuShape.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Oct 1;15(10):e0239287. doi: 10.1371/journal.pone.0239287.r002

Author response to Decision Letter 0


10 Jul 2020

We thank both Referees for a thorough reading of our manuscript “RNAthor – fast, accurate normalization, visualization and statistical analysis of RNA probing data resolved by capillary electrophoresis”. We considered all the remarks, and we substantially revised both - the manuscript and the webserver. The modified fragments of the manuscript are highlighted in red. Below, we respond to all comments of the Referees.

--------------------------------------------------------------------------------------------------------------------

Response to the comments of Referee #1:

Experimental data from RNA structure probing assays in the form of reactivities to structure-sensitive reagents can be integrated with RNA structure-prediction algorithms to improve prediction accuracy. To this end, the raw data is first processed through assay-specific pipelines to get reactivities of nucleotides of RNAs. The reactivities are normalized, and become the input to a structure-prediction algorithm along with the sequences of RNAs of interest. SHAPE-based probing followed by capillary electrophoresis is the traditional way to perform such experiments, and ShapeFinder is a popular tool to estimate reactivities from the electrophoresis data. RNAthor by Gumna et al. primarily serves the purpose of normalizing the reactivities from ShapeFinder and saving files that can be input to RNAstructure. It performs additional visualizations and some statistical tests.

Major comment 1: Normalization and exploratory visualization of data are important steps of data analysis. Experimental biologists often struggle with this. Hence, a good interactive web application for this would be a great development. Normalization of such data must account for poor quality information for some nucleotides. Gumna et al. have implemented their empirically developed approach to identify and exclude such nucleotides from analysis. This is the primary technical contribution in this manuscript. However, the approach has not been validated or compared with existing approaches. Tests against some benchmark data and performance assessment are required. Besides, the authors have not described the logic behind their exclusion criteria but simply stated them as mathematical rules. Of the three criteria, the one I could interpret --- filtering negative reactivities --- is common practice in the field. The other two criteria have cutoff values for comparison of peak areas, but it has not been ascertained that they are optimal. They have simply been stated as prior empirical knowledge. Hence, the novelty is low.

Response: Let us note that no benchmark data from RNA probing are publicly available. Moreover, scientists working with capillary electrophoresis who publish their probing data, share already normalized values. Therefore, we tested RNAthor on “ShapeFinder output files (IPF) from published and unpublished SHAPE-CE experiments performed in our laboratory. It was also tested on IPF files from hydroxyl radicals and DMS probing experiments resolved by capillary electrophoresis.” - this information was already given in the first version of the manuscript. However, we agree that we should have presented the results of our tests in the manuscript. We have filled in this gap now, and, in the Results section, we describe selected tests, present their results and perform a comparative analysis of the results generated by RNAthor with the results from RNA probing data analysis carried out manually by the expert. Note, that such comparison is the only possible one since there are no other automated approaches that combine CE probing data normalization and unreliable data exclusion (this fact was also underlined by the Referee 2).

Regarding the novelty, let us recall that RNAthor is the first tool providing an automated selection of unspecific RT stops for CE-based RNA probing data. It also gives the possibility of interactive RT stops detection via playing with the automatically generated histogram. Both options are not implemented in any other tool. Therefore, we believe the novelty is undeniable.

Finally, note that many modern computational tools and techniques are based on empirical knowledge or knowledge for which the logic behind is hard to be extracted and hardly understandable by a human. Let us mention the crowdsourcing systems or neural networks. No one challenges these solutions because of the lack of easily explainable logic, as they provide satisfactory solutions to important combinatorial problems. Our method of unreliable data detection resulted from years of experience and long-term decision process. The results of our manual analysis of RNA probing data, based on the rules that are now implemented in the RNAthor algorithm, were published in top scientific journals (e.g., Nature Communications, RNA Biology) and never challenged by experts in the field.

Major comment 2: Additionally, the authors claim that RNAthor "significantly reduces the time required for data analysis". However, this is not substantiated by any of the results. I'd rather say that an interactive application such as RNAthor could save the time spent writing scripts for data analysis. The current wording would be appropriate if the manuscript presented algorithmic advances that reduce time complexity of the analysis.

Response: I suppose that the not very precise wording in the manuscript caused the Referee to misunderstand what we wanted to write here. Therefore, we modified this part of the manuscript. It is much easier and less time consuming to use one program (like RNAthor) that processes the data instead of running three others in sequence and doing the manual analysis of the results. Let us explain how the experimenter, having the ShapeFinder data, should proceed to obtain same results as provided by RNAthor: normalize the absolute SHAPE or DMS reactivities with the spreadsheet, generate the histogram representing peak areas for modification reaction and background for each nucleotide, visually identify unreliable data, prepare the reactivity profiles with a graphical software, run the statistical analysis with a statistical program, prepare a file in the SHAPE data format (*.shape) compatible with the structure prediction software, run RNAstructure with probing data as an additional input. All these tasks take more time when done separately than when they run automatically in one pipeline, like in RNAthor. However, a reliable analysis of time complexity of manual data processing is rather impossible to do.Let us recall that in computing science, time complexity (known also as computational complexity) is estimated based on the number of basic operations (like comparison, simple mathematical operations, elements swapping, etc) done by the algorithm to solve the problem. We can estimate the complexity of all the operations in RNAthor, but we cannot do it for manual data processing. The time spent on manual analysis strongly depends on the experience of the experimenter. In the case of processing data from ShapeFinder, it can take one day for a beginner student and 1-2 hours for a trained experimenter, who has a workshop and scripts ready for simple data processing. Using RNAthor, the same people will do it in 2-3 minutes (if this is not the first time they use the tool) or 30 minutes (if we include the time for reading help).

Major comment 3: Further, the authors have motivated the manuscript in several places by claiming that SHAPE followed by capillary electrophoresis and analyzed by ShapeFinder is a widely recommended approach to study RNA structures. However, the references to support these assertions are often old papers from one particular lab. Hence, the application seems to be of limited interest.

Response: First, let us note that, following good and proper scientific practice, we have cited the original papers, which present the ShapeFinder method - experienced scientists are aware of the fact that this is the right approach (citing the original publication first instead of citing a paper that cites the origin). Obviously, these are papers from one lab where ShapeFinder was developed, and - obviously - the papers are not the recent ones because the method was introduced in 2008. These highly cited examples of ShapeFinder applications are often treated as milestones in RNA probing studies. However, to give examples of SHAPE-CE and ShapeFinder software usage we also cited 4 papers published in 2017 and 2 papers from 2019.

Second, we never claimed that SHAPE followed by capillary electrophoresis and analyzed by ShapeFinder is a widely recommended approach to study RNA structures. In the paper, one could find the following statements: (a) SHAPE-CE with ShapeFinder analysis is popular approach - see citation data below, (b) CE serves as a gold standard method for the single-nucleotide resolution of cDNA fragments from SHAPE or other RNA probing experiments - SHAPE NGS methods were validated using SHAPE-CE, e.g., the accuracy of SHAPEMap with SHAPEMapper1 or SHAPEMapper2 and SHAPE-Seq were tested using capillary electropherogram data (Siegfried et al 2014, Busan et al. 2018, Lucks et al 2011).

ShapeFinder is used in labs that perform capillary electrophoresis and a quick look into the worldwide bibliography, and into the citation record of the original papers proves this fact: the first ShapeFinder paper: 160 citations in Web of Science, 220 in Google Scholar; recent citations dated January 2020. Nevertheless, to solve the Referee’s doubts, we have added the references to the other papers concerning studies performed using ShapeFinder and QuShape. We also added the information on NGS-based RNA probing methods.

Minor comment 1: It would be nice to have a standalone version of the application.

Response: We have over 10 years of experience in developing bioinformatics programs. During this time, we have developed over 20 different systems for structural bioinformatics: 15 of them are web applications, and 5 are standalone programs. Some reviewers used to ask for the second version of the program: when we publish a web application, they ask for a standalone version, and when we release a standalone version, they require a web application. Such a request, however, should be justified in substance.

We always try to respond to the needs of our users, and after many years of experience, we know what is critical. A standalone version is very useful when it can be run from a command-line. Such a version is needed when users process huge amounts of data at a time (which is not the case of processing data from capillary electrophoresis) and/or when they do not interfere in the computational process (i.e. do not set a lot of parameters, do not make choices during the calculation process). In the case of interactive programs (RNAthor aims to be interactive), such versions are rather useless. Of course, there are also standalone versions with GUI, not dedicated to high-throughput analysis. They require users to download the program, configure the environment, often also download and install additional software or libraries - many users do not like it. Therefore, in many cases, the best choice is to have a web application. Let us recall the advantages of such solution: all responsibility for the maintenance of the program and the provision of computing power lies with the authors of the program, they can improve the program, remove bugs, add new functions and make the updated version immediately available online (so the users are not requested to update their standalone version, they just have the newest version already on), users do not worry about whether the program is compatible with their operating system and working environment. The only things which are required to use a web application are the internet connection and a web browser - no problem for any scientist these days. We developed RNAthor as the web application taking into account all of these reasons. We do not think that a standalone version is needed at this moment. But in the case of many requests for such a version from our users, we will provide it in the future.

Minor comment 2: What happens to data uploaded on the server if the user doesn't make an account? Is it also stored on your server for months? Please comment if there any data security issues.

Response: The input data and the corresponding results are stored in the RNAthor system for 3 months for registered users (available from their workspace); 3 days for guest users (available on the result page with the unique URL); 24 hours in the case of exemplary input processing (if the user clicks on Run for example SHAPE data or Run for example DMS data; results are available on the result page with the unique URL). Indeed, this information was not complete in the previous version of the RNAthor’s Help file. It is now available in Help section “2.7. Output data”. All the information concerning data issues is also provided in the “Terms and conditions” page (see: http://rnathor.cs.put.poznan.pl/terms). The link to this page is placed in the bottom navigation bar of RNAthor.

Minor comment 3: Page 10, line 207 says "averaged normalized SHAPE data with standard deviation". Is this referring to average across nucleotides or samples?

Response: The average and standard deviation are calculated separately for each nucleotide across samples. We added this information in the manuscript now.

Minor comment 4: Page 11, lines 232-239 describe additional statistics computed by RNAthor. However, the purpose behind these specific statistical tests and what the users could do with them is missing.

Response: Statistical tests aim to assess reproducibility and possibly also structural variation. Combining statistical information from a probing experiment with the analysis of nucleotide reactivities and the thermodynamic model of RNA folding is also a good and desirable practice. Such an approach can improve the accuracy of probing-directed structure modeling, and allow additional information to be extracted from structure probing data - whether a base is in the single-stranded or double-stranded region. One can find many examples of using similar statistical tests for SHAPE and DMS data analysis e.g. Wilkinson K.A. et al. PLoS Biol. 2008; Purzycka, K.J. et al. Nucleic Acids Res 2013; Li, P. et al. Cell Host Microbe, 2018; Huber, R.G. et al. Nat Commun 2019; Simon, L.M. et al. Nucleic Acids Res 2019 and many others. We reworded the fragment concerning the statistical analysis, and we hope this information is much clearer now.

--------------------------------------------------------------------------------------------------------------------

Response to the comments of Referee #2:

The manuscript by Gumna et al. presents a web-based platform for quality control of RNA structure probing data obtained by experiments that combine SHAPE chemistry with capillary electrophoresis (CE) quantification of the SHAPE reaction’s cDNA products. The platform takes a SHAPE reactivity profile as input and performs automated data normalization, needed to bridge between different experimental conditions and different RNAs, and automated detection of unreliable data points. Users can also visualize the data and run a statistical test that aims to assess reproducibility and possibly also structural variation.

This work tackles a very important aspect of structure probing data analysis, in particular one that has not received sufficient attention to date. Normalization and other quality control steps remain a relatively unexplored area, and researchers often resort to one of a few popular strategies, which many find to be over-simplistic, too narrowly focused, and generally unsatisfactory. As such, this work has the potential to have real impact. However, more work is needed on the authors’ side to bring this work to its full potential, in particular, more testing and a solid convincing demonstration of the utility and validity of the proposed approach.

Comment 1: Several things are missing. First, there are many methods and tools for SHAPE data analysis, currently disregarded by this work. Only 2 software platforms, ShapeFinder and QuShape, are mentioned here, but so many other tools were developed over the last 5-6 years. It is true that newer tools were designed with Seq/MaP/MaPseq protocols in mind; however, this is irrelevant because the proposed platform accepts reactivities as input. It doesn’t matter how reactivates were obtained, as reactivities processed from next-gen or from CE platforms are still reactivities. It has also been shown multiple times (mainly by the Weeks lab) that next-gen-based reactivities have very similar statistical properties to the one obtained by CE platforms. Accounting for other data processing platforms is important because they offer similar normalization routines, and in fact, most of them feature additional popular normalization routines. This, in turn, impacts the novelty of the proposed platform. What I find to be unique to this work is the authors’ approach to automating quality control, particularly the removal of potentially unreliable data points. I don’t think other platforms offer something similar, but this leads me to my second point, which is that the manuscript lacks any demonstration of the performance of the approach (or any other feature unique to this work, such as reproducibility assessment). I understand the authors have gained substantial experience analyzing structure probing data, but the fact that they believe their method “works well” is insufficient for publishing it. The only way to get users to try this out is by showing them, visually and also quantitatively, that the outputs are indeed robust and reliable. I have referred to this issue in more details in my comments below.

Response: First, let us recall that RNAthor has been developed for automatic normalization, visualization and statistical analysis of CE-based RNA probing data, not for NGS-based data analysis. Therefore the tools dedicated for NGS-based probing experiments were not mentioned in the original manuscript. However, we have written about them in the revised version.

Second, we found the suggestion of the Referee to test the RNAthor normalization function on the NGS data very interesting. Thus, we have investigated the problem but we have found that NGS-platforms have their own normalization tools. Therefore, it is not necessary to implement additional high-throughput data analysis functions in RNAthor - the tool designed for low- and medium-throughput CE-based data processing. Moreover, a comparison of normalization done by RNAthor and NGS platforms is impossible. The platforms for NGS data analysis don’t accept ShapeFinder output files (we tested it for SHAPEMapper and RNA Framework). Interestingly, although the option of the analysis of ShapeFinder files is provided in QuShape it does not work. Summing up, the suggestion that RNAthor should analyse all possible data is out of the question, since even the existing, widely used systems do not do so.

However, we significantly extended the scope of RNAthor by adding new tools that facilitate automatic exclusion of unreliable data and normalization of QuShape output files. We added the possibility to process DMS probing data. Moreover, users can obtain the secondary structure predicted for the RNA sequence provided at the input. RNAthor automatically executes the incorporated RNAstructure algorithm that supports SHAPE / DMS data-driven prediction. The graphical diagram of the structure is colored according to the color scheme defined for the default reactivity ranges. The output structure is also encoded in the dot-bracket notation.

Third, we tested RNAthor on “ShapeFinder output files (IPF) from published and unpublished SHAPE-CE experiments performed in our laboratory. It was also tested on IPF files from hydroxyl radicals and DMS probing experiments resolved by capillary electrophoresis.” - this information was already given in the first version of the manuscript. However, we agree that we should have presented the results of our tests in the manuscript. We have filled in this gap now, and, in the Results section, we describe selected tests, present their results and perform a comparative analysis of the results generated by RNAthor with the results from RNA probing data analysis carried out manually by the expert. Note, that such comparison is the only possible one since there are no other automated approaches that combine CE probing data normalization and unreliable data exclusion.

Comment 2: Finally, I wonder why the authors limit consideration to both SHAPE data and CE-based platforms. I think the scope of their work could be extended relatively easily. As I mentioned above, any reactivity profile needs to be normalized and quality-controlled. Furthermore, the bigger a dataset is, the more critical automation is, so why not consider the plethora of datasets obtained by high-throughput sequencing-based experiments? I also don’t see a reason to limit this work to SHAPE data. There are many popular alternatives today, including DMS, HRF, and several SHAPE variants, such as NAI. I understand the authors may have tailored their automated QC routine to the special properties of SHAPE, but it is worth testing how well it does on DMS data. Additionally, as shown by several labs, SHAPE-CE, SHAPE-MaP, and SHAPE-Seq all generate very similar data, so why not extend the scope at least in the context of the SHAPE probe?

Response: RNAthor was never limited to the SHAPE data. In the first version of the manuscript, we have already written that RNAthor was extensively tested using SHAPE, DMS and hydroxyl radical data. To make it clear now, we have added the DMS example in the webserver, so that the user can try the tool for these data as well. We have also added an option to enable automatic selection of DMS reactivities for an exclusion. See also our response to Comment 1, concerning NGS data processing.

Comment 3: To summarize, this work needs to be revised to account for a large body of work from labs other than the Weeks lab and for recent advances in structure probing experimental and data analysis capabilities. However, at the same time, it could also leverage the recent expansion in the scope of structure probing to provide a tool of much broader applicability than its current designation, especially because it targets a step in data analysis which has not been adequately addressed to date. Detailed comments and suggestions are below.

Response: Please see our response to comment 1 and 2.

Comment 4: There is some novelty in the automation of selection of reactivities for exclusion from analysis based on background signal. To the best of my knowledge, this type of quality control is normally done manually. However, the manuscript and the proposed tool target analysis of relatively short RNAs (no longer than ~300-500 nt) due to SHAPE/DMS and CE limitations. In such cases, manual/visual inspection of the signal and the background traces is not so time-consuming.

Response: Let us explain that ~300-500 nt is the length of one CE read, not the length of analyzed RNA. Users can process much longer RNAs; using normalized overlapping CE reads we analyze RNAs > 5kb nucleotide-long (Andrzejewska et al., in revision). RNAthor is not limited to short RNAs - we did not write anything like this in the manuscript. Note also that RNAthor offers more than just an automated selection of reactivities. We recommend to look at the charts, violin plots, or box plots prepared by our tool and try the statistical analysis. These are very useful features of RNAthor. We hardly believe anyone does all of this manually, even for relatively short RNAs.

Comment 5: The authors also claim that the proposed automation of selection “works well for SHAPE experiments performed in vitro or in vivo”(page 8). There are two key issues with this statement. First, it is not supported by evidence. Second, it is not clear how the authors determine that their method “works well”. I understand that this statement is based on vast experience with SHAPE data analysis, but this is not convincing from a reader’s perspective, and the readers are your potential users. I would like to see numerous real SHAPE data traces (in vitro and in vivo) from more than one lab, for which the automated method works as well as manual correction. This, in turn, requires a suitable quantitative assessment metric. Since currently there is no consensus metric for evaluating “goodness” of SHAPE data, the authors could come up with their own metric, as long as it is appropriate and convincing. Some investigators use SHAPE-directed structure prediction accuracy to benchmark the performance of data processing pipelines. However, I don’t find that convincing (unless differences are truly dramatic) because the NNTM model introduces so much additional complexity and uncertainty to the output. Another way is to show that agreement between replicates improves after the automated routine is applied to the data. Microarray informatics method developers commonly use this approach to demonstrate that a proposed pre-processing step is effective. Existing measures of replicate agreement that are specialized to structure probing data are found in Choudhary et al., Bioinformatics, 2016 and Choudhary et al., Genome Biology, 2019. Alternatively, the authors could propose a novel quantitative measure that captures those data characteristics that make them think it “works well”.

Response: First, let us note that in this comment the Referee denies what he/she wrote in Comment 1. Let us quote from Comment 1 “It doesn’t matter how reactivates were obtained…”, “reactivity normalization is independent of how reactivities are obtained”. According to this, reactivities obtained in vitro, in vivo, ex vivo, in virio, ex virio are still reactivities and do not require separate routines of normalization or selection of unreliable data.

Second, we agree that we should have provided the results of RNAthor testing. Therefore, we have revised the Results section and we included the results of computational tests performed with RNAthor to validate its reliability. Let us add that no benchmark data from RNA probing are publicly available. Generally, if the authors provide (as supplementary data) reactivity values they are already normalized and unreliable reactivities are set to -999. Therefore, wide computational tests of our tool were hardly possible based on what we found in the worldwide repositories of RNA probing data. To validate the automatic algorithm of RNAthor, we used in vitro and in vivo SHAPE data traces from our lab, which come from real experiments and are of very good quality. Researchers from our labs have a solid background and years of experience in RNA structure probing using SHAPE-CE, DMS-CE and HR-CE. We checked the validation of other tools for RNA probing data analysis (ShapeMapper1 and 2, ShapeFinder, QuShape, RNA Framework) and we found that these tools were also tested using diverse RNAs probed only in the labs of the software’s authors.

Finally, the approach discussed by the Referee, measuring the replicate agreement (SEQualyzer), serves for quality control and exploratory analysis of high-throughput RNA structural profiling data, whereas RNAthor was designed for low-throughput data analysis and visualization. Therefore, we do not find it useful in our case.

Comment 6: Data normalization: The authors implemented the popular box-plot method, but they caution the reader to avoid using it for RNAs shorter than 300 nt (page 9). So what should a user do if he/she is studying a relatively short RNA? This lack of options severely limits the utility of the proposed platform. Please also see my other comment below regarding normalization strategies that other data analysis platforms offer. How about providing an alternative strategy for relatively short RNAs?

Response: We thank the Referee for this comment. We improved the normalization method in RNAthor to allow normalization of short RNAs as well. According to a common approach in the box-plot method of normalization, outliers are determined as follows: for RNAs longer than 100 nucleotides no more than 10% of the data are removed, while for shorter RNAs maximum 5% of data are removed. We added this information in the manuscript.

Comment 7: Some statements need to be toned down and/or revised. For example, “Currently, the most common method of RNA chemical probing is SHAPE used in conjunction with capillary electrophoresis (SHAPE-CE)” (Abstract). While it is true that SHAPE used to be the most popular method for a decade or so, DMS appears to be as popular as SHAPE nowadays. I think it is more appropriate to say that one popular reagent choice is SHAPE. Note that there is a similar statement in the Introduction section, which also needs to be reworded. Additionally, I think that over the last 2-3 years, SHAPE/DMS in conjunction with next-gen sequencing (via Seq or MaPseq protocols, and very recently, also via direct RNA nanopore sequencing) has become a much more popular choice than traditional CE-based structure probing. A quick literature search would reveal both the widespread use of DMS for modification and the widespread reliance on NGS for cDNA sequencing. Another statement I found to be somewhat outdated is “ShapeFinder is a popular computational tool for the extraction of quantitative SHAPE reactivity values from raw CE electropherograms.” To the best of my knowledge, many labs use QuShape (a newer and more automated SHAPE-CE analysis software from the Weeks lab) and others use in-house scripts. While ShapeFinder used to be the platform of choice for SHAPE-CE analysis, a quick literature search will show this is no longer the case, especially over the last 2-3 years. Note, however, that I acknowledge there are major issues with QuShape’s performance, as I know that many labs are unhappy with it and seek better alternatives. Finally, I also find the statement “… manual normalization of these values to a uniform scale and exclusion of unreliable data are both required before their usage by RNA structure prediction software” to be somewhat misleading because QuShape does feature reactivity normalization and might also allow users to exclude unreliable data (not sure about the latter, though). The way the abstract is worded, one might think that users currently have no software tools for normalizing the data and possibly also excluding unreliable measurements, and I don’t think that is indeed the case.

Response: Following the Referee’s suggestion, we have revised many statements in the manuscript. Moreover, we have substantially extended the functionality of RNAthor and - currently - it can also process the DMS data and SHAPE data obtained from QuShape. We find ShapeFinder a popular tool, but it is true that QuShape also has many supporters. Let's take a look at the facts: according to the Web of Science, ShapeFinder paper has 160 citations (recent ones from January 2020), QuShape paper - 109 citations. As for the in-house scripts, it is rather hard to discuss their popularity, since such solutions are used only locally.

Comment 8: In continuation to my previous point, there are multiple software platforms, other than QuShape, which allow users to normalize structure probing data. In contrast to ShapeFinder and QuShape, these platforms were designed for NGS-based probing data, hence the initial input to these tools must take the form of reads, FASTA files, or read counts. However, once reactivities have been calculated, these softwares could be used to apply several different normalizations to them. In other words, reactivity normalization is independent of how reactivities are obtained. They may be obtained from CE or NGS data, using a variety of platforms, but once you have them, they can be normalized by numerous existing platforms. For example, SEQualyzer (Choudhary et al., Bioinformatics, 2016) features 2 normalization strategies: 2-8% and box-plot, RNA Framework (Incarnato et al., NAR, 2018) features 3 normalization strategies: 2-8%, box-plot, and 90% Winsorization, and StructureFold (Tang et al., Bioinformatics, 2015) features (via the Galaxy platform) 2-8% normalization and an option to cap reactivities. Some of these platforms also provide data visualization, for a single experiment and sometimes also for replicates. Some platforms are also open-source (e.g., SEQualyzer and RNA Framework) and this, in turn, allows users to directly use their normalization modules. All this prior work is missing from this manuscript, which might create an impression that users currently have no automated way of normalizing SHAPE/DMS reactivities, which is incorrect. Moreover, the most popular normalization strategies are likely 2-8% and box-plot and are described in detail in Sloma and Mathews, Methods Enzymology, 2015. In fact, one could easily implement them in a Matlab/R/Python script and I disagree with the authors that they “require significant user training and are … time consuming and prone to errors”.

Response: First, let us note that in this comment the Referee denies what he/she wrote in the beginning of the second paragraph of THIS review: “This work tackles a very important aspect of structure probing data analysis, in particular one that has not received sufficient attention to date. Normalization and other quality control steps remain a relatively unexplored area, and researchers often resort to one of a few popular strategies, which many find to be over-simplistic, too narrowly focused, and generally unsatisfactory.” Second, in our opinion an analysis of NGS-based data is outside the scope of this work. See also our response to comments 1, 2 and 5.

Comment 9: Page 4, second sentence: the description of SHAPE is limited to traditional truncation-based SHAPE chemistry. However, since 2014, modifications are alternatively detected via the MaP approach, where the RT introduces mutations at modified sites. This strategy can only be used in conjunction with DNA sequencing, which is likely why it is not mentioned here. However, for the text to be to be more accurate and up-to-date, I think it should be mentioned.

Response: We thank the Referee for this comment. We introduced additional descriptions in the manuscript.

Comment 10: Results, subsection “A brief overview of SHAPE-CE raw data analysis using ShapeFinder software”: I think this should be omitted or at the least moved to the Supp. Material because knowing how ShapeFinder works is not necessary for understanding the authors’ work. This is because both normalization and detection of unreliable data points (i.e., data QC) occur after the output from ShapeFinder has been obtained. Furthermore, ShapeFinder does not trigger the issues that normalization and data QC address. These issues are inherent to chemical modification experiments.

Response: Indeed, the Results section was not the right place to describe the analysis of SHAPE-CE data with ShapeFinder. We substantially revised the manuscript and this part was removed.

Comment 11: Results, subsection “RNAthor web application”: I find this material to be suitable for a user manual, especially the description of the various buttons, Reference page, Contact page, and Terms and Conditions page. These are not necessary for understanding the work or getting an idea of how user-friendly it is. It should be possible to compress this subsection into the most essential 4-5 sentences that refer to the ease of use and web interface. I also strongly encourage the authors to compile a short user manual and make it available on the platform’s webpage.

Results, subsection “Input Data”: Similar comments as above. The description is too detailed and should be substantially shortened (e.g., no need to mention which button to press).

Results, subsection “Output Data”: Similar comments as above.

Results, subsection “Output Data”: Note that most or all existing structure probing data analysis pipelines feature data visualization, not only ShapeFinder and QuShape. Some pipelines also allow users to visualize several reactivity profiles together.

Response: According to the Referee’s suggestion, we have removed the detailed description of the web application from the manuscript. Also the fragments about input and output data were reduced. Some fragments were moved to Help, which already existed before, but many details were missing there. Therefore, the RNAthor Help was also substantially extended.

Comment 12: Results, subsection “Additional Analysis of Normalized SHAPE-CE Data”: I agree with the authors that SHAPE/DMS data departure from a normal distribution. In fact, it has been rigorously established multiple times that this departure is quite significant (see, for example, Sukosd et al., NAR, 2014, Eddy, Ann. Reviews Biophysics, 2014, Deng et al., RNA, 2016). I therefore do not see why a test for normality is implemented. If the authors have SHAPE data that is nearly Gaussian, I would like to see it appended as Supp. Material.

Response: We did not intend to suggest that our SHAPE data are nearly Gaussian. We have applied the standard pathway of statistical analysis here.

Comment 13: Results, subsection “Additional Analysis of Normalized SHAPE-CE Data”: To facilitate differential reactivity analysis between two samples and/or to assess reproducibility across experiments, the authors implemented a Mann-Whitney U test, aka Wilcoxon rank-sum test. Note that a very similar test, namely, a Wilcoxon signed-rank test, was recently used for differential reactivity analysis in Choudhary et al., Genome Biology, 2019, albeit applied to similarity scores, not reactivities. The Wilcoxon rank-sum test makes an independence assumption, whereas the signed-rank version of the test relaxes this assumption. Can the authors justify their assumption that the compared samples are independent, especially if these are replicates of the same experiment? Additionally, note that the proposed test does not account for biological variation between samples, since one needs >= 2 replicates from each condition/experiment to assess biological variation per condition. What this means is that if one performs the experiment on two biological replicates (especially relevant to in vivo samples), the test might indicate that results are not reproducible because it picks up the biological variation in the RNA’s structure and its reactivity to the SHAPE reagent. I think it is important to alert users to the limitation of the proposed test and to emphasize what it is capable of detecting. This limitation is inherent to any differential analysis test that compares between reactivity profiles without pre-assessing biological variability between samples in a condition.

Response: RNAthor incorporates statistical tests used by many labs (including Weeks lab) in which CE experiments are performed. These experiments result in obtaining independent samples, therefore the choice of non-parametric Mann-Whitney test (if the user selected 2 experiments) and Kruskal-Wallis rank-sum test (for 3-5 experiments). Let us underline these tests are not our choice. We implemented solutions already applied successfully in many laboratories.

Comment 14: As with the automated QC routine, I am missing a demonstration of the utility and the validity of the proposed statistical test. This could be done via benchmarking against existing methods (see, for example, Choudhary et al., Genome Biology, 2019 for a benchmark of recent methods). Potential users need to see some concrete evidence that the proposed test has statistical power. Alternatively, at the least, the authors should show examples of reproducible and irreproducible traces and their corresponding test scores as well as examples of how the test performs on biological replicates vs technical replicates. A comparison with existing reproducibility (QC) measures could also be helpful.

Response: Unfortunately, we do not understand this comment. Why should we validate and show statistical power of statistical tests which exist from years? This is the duty of the authors of the tests not of its users. See also REsponse to the previous comment.

Comment 15: It is clear that the authors have gained tremendous experience analyzing CE-based structure probing data, and that the proposed platform is the result of years of empirical hands-on experience. This is invaluable, especially since there aren’t many labs with that level of expertise. For this reason, this work has the potential to lead to real advances in data analysis. In particular, the authors say “RNAthor was extensively tested on ShapeFinder output files (IPF) from published and unpublished SHAPE-CE experiments performed in our laboratory. It was also tested on IPF files from hydroxyl radicals and DMS probing experiments resolved by capillary electrophoresis.” I think what’s really missing in this manuscript, other than an up-to-date review of existing work, is a demonstration of improved performance on many real datasets from multiple probes, multiple conditions, multiple labs, and multiple RNAs. Such demonstration should also include comparisons to existing alternatives, and there are several ones other than ShapeFinder and QuShape.

Response: We thank the Referee for appreciating our expertise in analyzing CE-based data. We would very much like to test our tool on many data sets from other laboratories. Unfortunately, it is not possible because other labs do not publish raw data from capillary electrophoresis. We have put a lot of effort into searching for such data and contacting other laboratories to obtain data for testing but we were unsuccessful. We also searched the literature, since some authors claimed to provide their raw data in supplementary materials. Unfortunately, it turned out that even in cases, where the authors of the publication attached supplementary files described as raw data, the files contained normalized data. Therefore, it is not possible to demonstrate the performance of RNAthor for multiple probes from external labs. We could only do it for data obtained in our laboratory, and we already did this before our paper submission. However, we agree that a demonstration of the tool's performance is important. Thus, we substantially revised the Results section and we described exemplary comparative experiments performed with RNAthor. We compared the results obtained by RNAthor with the results received from manual analysis done by the expert, for exemplary SHAPE and DMS data.

Attachment

Submitted filename: Response-to-referees.pdf

Decision Letter 1

Danny Barash

27 Aug 2020

PONE-D-20-09557R1

RNAthor – fast, accurate normalization, visualization and statistical analysis of RNA probing data resolved by capillary electrophoresis

PLOS ONE

Dear Dr. Szachniuk,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The revised manuscript should address all the critical points raised by all reviewers.

Please submit your revised manuscript by Oct 11 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Danny Barash

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I thank the authors for considering my suggestions. The revised manuscript presents a better case for the utility of RNAthor. The authors have explained why it is not necessary to have a standalone version of RNAthor at this time, added relevant references, as well as addressed my other minor comments.

A good number of experimental biologists are indeed unskilled at even basic data analysis tasks. Such biologists could benefit from a web server where they can upload outputs of ShapeFinder or QuShape and save normalized data, relevant figures and results from statistical tests. Hence, as a software tool RNAthor seems to have utility.

The primary contribution of the manuscript is a web server that combines existing functionalities from other sources. The method for automatic identification of unreliable data is an implementation of the rules that they have utilized previously (e.g., in their articles in Nature Communications, RNA Biology). The statistical tests that they perform have been used by other researchers active in the field. So overall, there appears to be no novelty with regards to the methods.

In my previous review, my major concern was that the method to identify “unreliable probing data” has not been validated. In revision, the authors have shown that the results from RNAthor are comparable to those from their manual analysis. This serves as evidence that there are no bugs in the software and that the rules followed by the human analyst have been faithfully implemented. The differences that they observed can be explained by subjectivity of manual analysis. However, objective validation of the method is still lacking.

The authors claim that comparison of RNAthor with manual analysis is the only possible validation. I believe that at the very least, the authors could test a range of cutoffs to identify and exclude unreliable data. The results from different cutoffs could be objectively compared by examining the accuracy in reproducing well-studied RNA structures, or other biological results that are widely believed in the field to be true. In my assessment, stating that an implemented algorithm is based on experience is not enough to claim novelty of the method. To make such a claim, the authors must demonstrate that out of a set of other plausible and reasonable methods, the method implemented in the web server is the one that performs the best. The authors can take a look at the following paper from the Laedarach lab as an example. In this article, Woods and Laedarach automated some of the rules used by humans for manual analysis. They tested a range of algorithms to identify the one that performs the best.

Woods, C. T., & Laederach, A. (2017). Classification of RNA structure change by ‘gazing’at experimental data. Bioinformatics, 33(11), 1647-1655.

In summary,

1. I find the article acceptable if it is classified as reporting a software tool. I’d recommend that the authors tone down their claim of novelty with regards to “an algorithm for the automatic identification of unreliable probing data” as this claim requires objective validations that are not there. I understand that the authors may have added this claim in response to my earlier comment about lack of novelty. My comment was based on assessing the manuscript as a methods research article, which may not be the intention of the authors. So it is best to remove this claim. Also, the authors should make the raw data used for figures 3-4 available as supplementary data, or provide links if they are available online.

2. To be accepted as a methods research article, major revisions are still needed to demonstrate that the method is indeed optimal and applicable for general use. Lack of gold standards to objectively evaluate methods for analysis of RNA structure-probing data is a challenge faced by all methods researchers active in the field. However, the community has also found acceptable ways for such evaluation. If the authors did indeed mean to publish RNAthor as a methods research article, I hope that the authors will borrow ideas from other manuscripts and consider validating their method.

Reviewer #2: The authors added several useful features to their web server, such as options to perform and visualize data-directed secondary structure prediction

and to analyze DMS data. Hopefully, this would make the proposed tool more appealing to potential users, although overall, the main novelty in this work

remains a fairly simple routine for automated detection of unreliable data points. A comparison between manual and automated data processing helps

demonstrate that the automated detection is reliable/judicious, although I would expect to see more examples if I were a potential user, or at least manual

analysis by additional experts (not the makers of the tool). Regarding the optional statistical tests, I think there are more statistically-sound and

powerful differential reactivity analysis methods out there, and I don’t feel what’s offered by this tool is very powerful.

Finally, please note that the graphics is of poor quality and should be improved.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Oct 1;15(10):e0239287. doi: 10.1371/journal.pone.0239287.r004

Author response to Decision Letter 1


31 Aug 2020

Response to the comments of Referee #1:

COMMENT:

I thank the authors for considering my suggestions. The revised manuscript presents a better case for the utility of RNAthor. The authors have explained why it is not necessary to have a standalone version of RNAthor at this time, added relevant references, as well as addressed my other minor comments. A good number of experimental biologists are indeed unskilled at even basic data analysis tasks. Such biologists could benefit from a web server where they can upload outputs of ShapeFinder or QuShape and save normalized data, relevant figures and results from statistical tests. Hence, as a software tool RNAthor seems to have utility.

The primary contribution of the manuscript is a web server that combines existing functionalities from other sources. The method for automatic identification of unreliable data is an implementation of the rules that they have utilized previously (e.g., in their articles in Nature Communications, RNA Biology). The statistical tests that they perform have been used by other researchers active in the field. So overall, there appears to be no novelty with regards to the methods.

In my previous review, my major concern was that the method to identify “unreliable probing data” has not been validated. In revision, the authors have shown that the results from RNAthor are comparable to those from their manual analysis. This serves as evidence that there are no bugs in the software and that the rules followed by the human analyst have been faithfully implemented. The differences that they observed can be explained by subjectivity of manual analysis. However, objective validation of the method is still lacking.

The authors claim that comparison of RNAthor with manual analysis is the only possible validation. I believe that at the very least, the authors could test a range of cutoffs to identify and exclude unreliable data. The results from different cutoffs could be objectively compared by examining the accuracy in reproducing well-studied RNA structures, or other biological results that are widely believed in the field to be true. In my assessment, stating that an implemented algorithm is based on experience is not enough to claim novelty of the method. To make such a claim, the authors must demonstrate that out of a set of other plausible and reasonable methods, the method implemented in the web server is the one that performs the best. The authors can take a look at the following paper from the Laedarach lab as an example. In this article, Woods and Laedarach automated some of the rules used by humans for manual analysis. They tested a range of algorithms to identify the one that performs the best.

Woods, C. T., & Laederach, A. (2017). Classification of RNA structure change by ‘gazing’at experimental data. Bioinformatics, 33(11), 1647-1655.

RESPONSE:

We agree that presenting the results of extensive and sophisticated computational tests of the new program significantly increases its credibility. However, it is difficult to perform many tests in the absence of raw data that other laboratories do not provide. The idea of testing a range of cutoffs is very interesting. It is quite easy to implement for a single-criterion problem, but our procedure is a multi-criteria decision problem. RNAthor has several input parameters (which the user can turn on/off) and internal parameters (bgArea, areaDifference, effectiveMaximum) - they depend on one another and all of them impact the results. We do not see the possibility to test these dependencies in a reasonable time.

COMMENT:

In summary,

1. I find the article acceptable if it is classified as reporting a software tool. I’d recommend that the authors tone down their claim of novelty with regards to “an algorithm for the automatic identification of unreliable probing data” as this claim requires objective validations that are not there. I understand that the authors may have added this claim in response to my earlier comment about lack of novelty. My comment was based on assessing the manuscript as a methods research article, which may not be the intention of the authors. So it is best to remove this claim. Also, the authors should make the raw data used for figures 3-4 available as supplementary data, or provide links if they are available online.

RESPONSE:

From the very beginning, we intended to present a new software tool, not a new method. We agree that the paper should be classified as reporting a software tool. As recommended, we removed the claim of novelty regarding the unreliable data identification.

We also prepared the file with supplementary materials that includes raw data used for Figures 3-4. These data are also available as example data in the RNAthor web application.

COMMENT:

2. To be accepted as a methods research article, major revisions are still needed to demonstrate that the method is indeed optimal and applicable for general use. Lack of gold standards to objectively evaluate methods for analysis of RNA structure-probing data is a challenge faced by all methods researchers active in the field. However, the community has also found acceptable ways for such evaluation. If the authors did indeed mean to publish RNAthor as a methods research article, I hope that the authors will borrow ideas from other manuscripts and consider validating their method.

RESPONSE:

We agree that our paper is not a methods research article. I suppose the misunderstanding was caused by the fact that sometimes we use the terms algorithm, method, procedure, and software tool interchangeably. To clear this issue, we have introduced several modifications in the manuscript.

--------------------------------------------------------------------------------------------------------------------

Response to the comments of Referee #2:

COMMENT:

The authors added several useful features to their web server, such as options to perform and visualize data-directed secondary structure prediction and to analyze DMS data. Hopefully, this would make the proposed tool more appealing to potential users, although overall, the main novelty in this work remains a fairly simple routine for automated detection of unreliable data points. A comparison between manual and automated data processing helps demonstrate that the automated detection is reliable/judicious, although I would expect to see more examples if I were a potential user, or at least manual analysis by additional experts (not the makers of the tool). Regarding the optional statistical tests, I think there are more statistically-sound and powerful differential reactivity analysis methods out there, and I don’t feel what’s offered by this tool is very powerful.

Finally, please note that the graphics is of poor quality and should be improved.

RESPONSE:

Referring to the suggestion to include more example data, we would like to note that although we currently have one example for SHAPE and one example for DMS data on the RNAthor webserver, these are complex examples. Each of them consists of three raw data files obtained for a relatively large RNA. The user can download the example files and rework them - creating new examples (e.g. load only two data files instead of three, cut out the data for a selected particle, shorten the particle sequence). It seems to us that there is a large field here to test our program using the files we provided. We would be happy to give more examples if there are scientists from other labs who provide us with their data. Unfortunately, we have not managed to find raw CE data publicly available, which we could use. We still have sets of data from our laboratory, which we can add as running examples to the RNAthor webserver as soon as we publish the structures that have been determined based on these data.

As for the statistical tests, we agree that various statistical methods could be implemented apart from the ones we provided. So far, we included basic tests, which - to our knowledge - are used commonly in many labs (including Week’s lab and our lab). But we are always open to suggestions of our users and we often add new options to our web servers if the users request them and explain why the new options are necessary. In the future, we will gladly enrich RNAthor with new features, including more powerful statistical tests.

Poor quality of the figures in the manuscript is probably due to very high compression on the PLOS server side. We can also see it in the pdf version generated by the submission server. All drawings in the original version meet PLOS requirements (tiff format, resolution 400dpi) and are of sufficient quality.

Attachment

Submitted filename: RNAthor-response-to-reviewers.pdf

Decision Letter 2

Danny Barash

3 Sep 2020

RNAthor – fast, accurate normalization, visualization and statistical analysis of RNA probing data resolved by capillary electrophoresis

PONE-D-20-09557R2

Dear Dr. Szachniuk,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Danny Barash

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Danny Barash

22 Sep 2020

PONE-D-20-09557R2

RNAthor – fast, accurate normalization, visualization and statistical analysis of RNA probing data resolved by capillary electrophoresis

Dear Dr. Szachniuk:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Danny Barash

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (PDF)

    Attachment

    Submitted filename: Response-to-referees.pdf

    Attachment

    Submitted filename: RNAthor-response-to-reviewers.pdf

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES