Multicellular, IVT-derived, unmodified human transcriptome for nanopore-direct RNA analysis

. 2024 Jun 17;2024:gigabyte129. doi: 10.46471/gigabyte.129

Reviewer name and names of any other individual's who aided in reviewer	Jiaxu Wang
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.)	Yes
Is the language of sufficient quality?	Yes
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper?	Yes
Additional Comments
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a>	Yes
Additional Comments
Is the data acquisition clear, complete and methodologically sound?	Yes
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction?	Yes
Additional Comments
Is there sufficient data validation and statistical analyses of data quality?	No
Additional Comments	The authors ran DSR for the in vitro transcribed transcriptional RNAs from 6 cell lines to remove the possible natural modifications. The data can be used as a control RNA pool for natural or artificial modification studies. however, more statistical analyses should be performed for the data quality. see comments below: (1) For more possible usage of this data, some QC analysis is better to be provided to confirm the quality of these sequencing data. For example: 1) What is the correlation between in vitro transcribed transcriptional RNAs and original DSR for each cell line? 2) how many genes have been captured in each cell line? (2) In Figure 2B, the author provides 3 conditions for ‘exclude’ and ‘include’, some statistical analysis should be performed to confirm how many cases in condition 1, condition 2, and condition 3. How many mismatches are showing in only 1 cell line, some cell lines or all the cell lines? The shared correct genes may be more confident references for the modification analysis. (3) Different reads of the same gene could have different mismatches in the IVT RNAs due to RT-PCR bias or other reasons (especially for the lower expressed RNAs), for example, there are 100 reads in total, 90 reads are the correct nucleotide at a given position, 10 reads have a mismatch in the IVT sample, then how to define the signal as the control reference? Given that the nature modification is low in RNA, some threshold should be applied for the confident result, for example, what is the lowest expression threshold that could be used as a confident control reference?
Is the validation suitable for this type of data?	Yes
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data?	No
Additional Comments	For more possible usage of this data, more QC data should be performed, please refer to my above comments
Any Additional Overall Comments to the Author
Recommendation	Major Revision