Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
letter
. 2020 Aug 11;117(32):18934–18936. doi: 10.1073/pnas.2001675117

When DNA gets in the way: A cautionary note for DNA contamination in extracellular RNA-seq studies

Jasper Verwilt a,b,1, Wim Trypsteen a,b, Ruben Van Paemel a,b, Katleen De Preter a,b, Maria D Giraldez c,d, Pieter Mestdagh a,b, Jo Vandesompele a,b
PMCID: PMC7431080  PMID: 32788394

With great interest, we read the paper by Zhou et al. (1) describing a methodology that enables extracellular RNA sequencing (exRNA-seq) from extremely low input (Small Input Liquid Volume Extracellular RNA Sequencing [SILVER-seq]). We were intrigued by the high number of detected genes compared to our previous studies (2, 3) and noticed low reproducibility. We hypothesized that these observations could originate from substantial DNA contamination. Therefore, we reanalyzed the SILVER-seq data (4) to determine the extent of DNA signal in the sequencing reads.

First, we analyzed the fraction of reads mapping to the different genomic regions. We noticed that these fractions closely resembled the distributions in the genome (Fig. 1A). Specifically, fewer than 5% of the reads mapped to exonic regions, while our own exRNA-seq data (3) showed an average of 35% exonic reads. Secondly, we analyzed reads mapping to spliced sequences, expecting them to be relatively abundant in RNA. However, we found that reads mapping to spliced sequences made up only 0.22% of the total uniquely mapped reads, whereas, in our own RNA-seq data, they represented 17.8%, about 81-fold higher (Fig. 1B). Thirdly, we generated copy number profiles for a female patient with breast cancer (SRR9094442) and a healthy male control (SRR9094547). The cancer patient’s profile showed a pattern with clear copy number changes (e.g., chromosomes 5, 11, and 20), a result typically found using cell-free DNA data (Fig. 2A). The copy number profile of the male control displayed an almost flat copy number profile, with chromosomes X and Y showing half the copy number levels of the autosomes (Fig. 2B), in line with the expectations of a normal control’s cell-free DNA. Finally, strandedness assessment of the SILVER-seq reads could not unambiguously confirm that the data come from RNA (Fig. 1C). This means that either the library preparation method does not preserve strand orientation of the fragments (which is not specified in the paper) or that the data are predominantly coming from DNA. In an attempt to use only reads that must originate from RNA, we looked at exRNA genes with reads mapping over splice junctions and with transcripts per million higher than 5, as recommended by the authors (1). A median of only 560 genes per sample remain after filtering, or 44 times lower than reported.

Fig. 1.

Fig. 1.

Regional coverage, splice read fractions, and strandedness of the data. (A) Fractions of reads mapping to exonic, intronic, and intergenic regions. The average fractions and average number of reads are printed. The bottom and top dashed blue lines indicate the fraction of base pairs classified as exonic (0.0427) and intronic/intergenic (0.479) in the genome. These numbers represent the fraction of reads mapping to exonic and intronic/intergenic regions, respectively, if they would originate from random locations in the genome. (B) Fraction of reads mapping to splice and nonsplice regions. The average fractions are printed. (C) Strandedness of the data. Strandedness of the data (“same strand”) is expected to be 1 for stranded data and 0.5 for unstranded data. The average fractions are printed.

Fig. 2.

Fig. 2.

Copy number profiles generated from SILVER-seq data. Yellow segments indicate a lower copy number, and green segments indicate a higher copy number. (A) Copy number profile of a female breast cancer patient. (B) Copy number profile of a healthy male.

Our reanalyses present evidence supporting that the majority of the SILVER-seq data are derived from DNA, rather than exRNA. Although the authors performed a DNase treatment aimed to prevent this issue (1), no quality control was performed to verify its efficacy. We hypothesize that the amount of cell-free DNA was too high or that inhibitors present in serum precluded efficient enzymatic DNA removal. Moreover, the authors did not perform any data analysis evaluating the presence of DNA signal in their sequencing data, as the ones reported here. Importantly, we emphasize that our observations do not undermine the potential utility of SILVER-seq. Our letter aims to serve as a reminder of the current limitations of RNA-seq workflows on biofluids and as a plea for extensive quality control of RNA-seq data in general.

Data Availability Statement.

The code used for data analysis is available on GitHub at https://github.com/jasperverwilt/SILVER-Seq_comment (5).

Footnotes

The authors declare no competing interest.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The code used for data analysis is available on GitHub at https://github.com/jasperverwilt/SILVER-Seq_comment (5).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES