Skip to main content
. 2017 Jan 25;45(6):2960–2972. doi: 10.1093/nar/gkw1350

Table 2. The number and percent of reads filtered at each stage of pre-processing for all datasets used in this section.

M. Liver M. EF HEK293 HEK293, Gao C. elegans (aggregate)
Raw data 9E + 6 3E + 7 3E + 7 3E + 7 9E + 8
Poor quality 8E + 4 (1%) 4E + 5 (1%) 5E + 5 (2%) 3E + 5 (1%) 4E + 7 (4%)
Ribosomal 5E + 6 (55%) 6E + 6 (17%) 2E + 6 (7%) 2E + 7 (66%) 5E + 8 (56%)
No alignment 1E + 6 (15%) 9E + 6 (27%) 3E + 6 (11%) 3E + 6 (10%) 2E + 8 (25%)
Multimappers 6E + 5 (7%) 8E + 6 (23%) 6E + 6 (20%) 2E + 6 (9%) 3E + 7 (3%)
Non-periodic 1E + 5 (1%) 4E + 5 (1%) 4E + 5 (2%) 1E + 5 (0%) 6E + 7 (7%)
Usable 1E + 6 (21%) 1E + 7 (31%) 1E + 7 (59%) 4E + 6 (14%) 4E + 7 (5%)

‘Raw data’ gives the total number of reads in the dataset. ‘Poor quality’ reads are either too short after removing adapters or do not have adequate fastq quality scores. ‘Ribosomal’ reads map to known ribosomal sequences. ‘No alignment’ reads do not align to the genome. ‘Multimappers’ map to the genome in multiple locations. ‘Non-periodic’ reads are of lengths whose metagene profiles do not result in a periodic signal. ‘Usable’ reads are kept for further analysis. The detailed counts for all samples, including all C. elegans replicates, are given in Supplementary File 7. We obtain a much higher percentage of rRNA reads from dauer stage lysates than from lysates of other developmental stages of the C. elegans life cycle (2).