With great interest, we read the paper by Zhou et al. (1) describing a methodology that enables extracellular RNA sequencing (exRNA-seq) from extremely low input (Small Input Liquid Volume Extracellular RNA Sequencing [SILVER-seq]). We were intrigued by the high number of detected genes compared to our previous studies (2, 3) and noticed low reproducibility. We hypothesized that these observations could originate from substantial DNA contamination. Therefore, we reanalyzed the SILVER-seq data (4) to determine the extent of DNA signal in the sequencing reads.
First, we analyzed the fraction of reads mapping to the different genomic regions. We noticed that these fractions closely resembled the distributions in the genome (Fig. 1A). Specifically, fewer than 5% of the reads mapped to exonic regions, while our own exRNA-seq data (3) showed an average of 35% exonic reads. Secondly, we analyzed reads mapping to spliced sequences, expecting them to be relatively abundant in RNA. However, we found that reads mapping to spliced sequences made up only 0.22% of the total uniquely mapped reads, whereas, in our own RNA-seq data, they represented 17.8%, about 81-fold higher (Fig. 1B). Thirdly, we generated copy number profiles for a female patient with breast cancer (SRR9094442) and a healthy male control (SRR9094547). The cancer patient’s profile showed a pattern with clear copy number changes (e.g., chromosomes 5, 11, and 20), a result typically found using cell-free DNA data (Fig. 2A). The copy number profile of the male control displayed an almost flat copy number profile, with chromosomes X and Y showing half the copy number levels of the autosomes (Fig. 2B), in line with the expectations of a normal control’s cell-free DNA. Finally, strandedness assessment of the SILVER-seq reads could not unambiguously confirm that the data come from RNA (Fig. 1C). This means that either the library preparation method does not preserve strand orientation of the fragments (which is not specified in the paper) or that the data are predominantly coming from DNA. In an attempt to use only reads that must originate from RNA, we looked at exRNA genes with reads mapping over splice junctions and with transcripts per million higher than 5, as recommended by the authors (1). A median of only 560 genes per sample remain after filtering, or 44 times lower than reported.
Our reanalyses present evidence supporting that the majority of the SILVER-seq data are derived from DNA, rather than exRNA. Although the authors performed a DNase treatment aimed to prevent this issue (1), no quality control was performed to verify its efficacy. We hypothesize that the amount of cell-free DNA was too high or that inhibitors present in serum precluded efficient enzymatic DNA removal. Moreover, the authors did not perform any data analysis evaluating the presence of DNA signal in their sequencing data, as the ones reported here. Importantly, we emphasize that our observations do not undermine the potential utility of SILVER-seq. Our letter aims to serve as a reminder of the current limitations of RNA-seq workflows on biofluids and as a plea for extensive quality control of RNA-seq data in general.
Data Availability Statement.
The code used for data analysis is available on GitHub at https://github.com/jasperverwilt/SILVER-Seq_comment (5).
Footnotes
The authors declare no competing interest.
References
- 1.Zhou Z., et al. , Extracellular RNA in a single droplet of human serum reflects physiologic and disease states. Proc. Natl. Acad. Sci. U.S.A. 116, 19200–19208 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hulstaert E., et al. , Charting extracellular transcriptomes in The Human Biofluid RNA Atlas. bioRxiv, 10.1101823369 (5 November 2019). [DOI] [PubMed] [Google Scholar]
- 3.Everaert C., et al. , Performance assessment of total RNA sequencing of human biofluids and extracellular vesicles. Sci. Rep. 9, 17574 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zhou Z., Zhong S., Data from “Extracellular RNA in a single droplet of human serum reflects physiologic and disease states”. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE131512. Accessed 8 January 2020.
- 5.Verwilt J., Van Paemel R., Code for “When DNA gets in the way: A cautionary note for DNA contamination in extracellular RNA-seq studies”. GitHub. https://github.com/jasperverwilt/SILVER-Seq_comment. Deposited 28 January 2020. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The code used for data analysis is available on GitHub at https://github.com/jasperverwilt/SILVER-Seq_comment (5).