Skip to main content
. 2020 Jun 19;2(2):lqaa040. doi: 10.1093/nargab/lqaa040

Table 2.

The number (and proportions) of remaining (a) sequences, (b) features (mRNA in the case of RNA-seq data and OTUs in the case of metagenomic data), (c) data with zero counts and (d) data counts between 2 and 9, after different filtering methods for the three example studies

(a) (b) (c) (d)
Dataset Threshold No. of sequences % No. of features % No. of zeros % No. [2-9] %
Yeast RNAseq, N= 16 samples (mRNA)
No filtering 37 710 728 100.00 3034 100.00 56 100.00 278 100.00
Relative abundance ≥ .0001 34 330 805 91.04 2019 66.55 0 0.00 7 2.52
Relative abundance ≥ .001 22 464 080 59.57 317 10.45 0 0.00 0 0.00
Relative abundance ≥ .01 8 277 104 21.95 24 0.79 0 0.00 0 0.00
Count ≥ 2 37 710 696 100.00 3031 99.90 8 14.29 278 100.00
Count ≥ 10 37 708 896 100.00 3029 99.84 7 12.50 269 96.76
Tara Oceans, N= 139 samples (OTU)
No filtering 14 129 941 100.00 35 651 100.00 4 394 814 100.00 199 424 100.00
Relative abundance ≥ .0001 13 093 797 92.67 7250 20.34 595 938 13.56 155 003 77.73
reRelative abundance ≥ .001 8 241 812 58.33 2450 6.87 135 678 3.09 56 849 28.51
Relative abundance ≥ .01 1 499 364 10.61 113 0.32 5324 0.12 2369 1.19
Count ≥ 2 13 941 637 98.67 19 803 55.55 2 222 449 50.57 199 424 100.00
Count ≥ 10 13 147 108 93.04 7483 20.99 623 333 14.18 157 107 78.78
Gut microbiome, N= 265 samples (OTU)
No filtering 17 365 964 100.00 10 000 100.00 2 535 419 100.00 37 964 100.00
Relative abundance ≥ .0001 17 266 878 99.43 9862 98.62 2 499 064 98.57 37 893 99.81
Relative abundance ≥ .001 16 302 087 93.87 8992 89.92 2 276 347 89.78 34 302 90.35
Relative abundance ≥ .01 12 125 721 69.82 1521 15.21 370 082 14.60 7431 19.57
Count ≥ 2 17 346 927 99.89 9897 98.97 2 508 181 98.93 37 964 100.00
Count ≥ 10 17 180 567 98.93 9419 94.19 2 382 756 93.98 37 141 97.83

The first row shows no filtering of the dataset, so for yeast, there are 37.7M sequences, of which 56 are zero counts and 278 have counts between 2 and 9; these sequences collapse down to 3K features after clustering. The second row shows in the Tara Oceans dataset that by filtering on relative abundance ≥0.0001, we reduce the number of OTUs from 35 651 down to 7250 (20%), which is comparable to using the threshold of absolute minimum count of 10. The number of zero count data has also reduced significantly from 4.4M to 596K.