Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2011 Aug 12;5:98. doi: 10.3389/fnins.2011.00098

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2011 Chen, Liu, Gong, Wu, Taylor, Williams, Matta and Sharp.

This is an open-access article subject to a non-exclusive license between the authors and Frontiers Media SA, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and other Frontiers conditions are complied with.

PMC Copyright notice

Effect of sequencing depth on RNA-seq data quality. (A) We obtained 78 million 50-bp reads from rat 1. Different numbers of reads were randomly sampled from the full data set of 78 million reads. The number of unique RefSeq genes was calculated from the full and randomly sampled subsets. The total number of unique genes remained stable when greater than 9.8 million reads were analyzed. (B) Gene expression levels calculated from these randomly selected subsets were highly correlated with the full data set when RPKM is above ∼2³, despite gradual decreasing the number of reads to 9.8 million. (C) The number of unique exons detected were calculated for the entire and randomly sampled subsets. The total number of exons remained stable when greater than 9.8 million reads were analyzed. (D) The number of reads for each exon obtained from the randomly sampled data sets were correlated to the entire data set. The correlation for exons with more than ∼2⁵ reads was greater than 0.9 despite the decrease in reads to 9.8 million. At 19.5 million reads, exons with ∼8 reads still had correlation of ∼0.7 with values obtained from the entire data set. These results suggest that ∼15–20 million reads are sufficient to generate estimates of gene expression levels that are similar to those obtained from 70 to 80 million reads.