Table 2. The number and percent of reads filtered at each stage of pre-processing for all datasets used in this section.
M. Liver | M. EF | HEK293 | HEK293, Gao | C. elegans (aggregate) | |
---|---|---|---|---|---|
Raw data | 9E + 6 | 3E + 7 | 3E + 7 | 3E + 7 | 9E + 8 |
Poor quality | 8E + 4 (1%) | 4E + 5 (1%) | 5E + 5 (2%) | 3E + 5 (1%) | 4E + 7 (4%) |
Ribosomal | 5E + 6 (55%) | 6E + 6 (17%) | 2E + 6 (7%) | 2E + 7 (66%) | 5E + 8 (56%) |
No alignment | 1E + 6 (15%) | 9E + 6 (27%) | 3E + 6 (11%) | 3E + 6 (10%) | 2E + 8 (25%) |
Multimappers | 6E + 5 (7%) | 8E + 6 (23%) | 6E + 6 (20%) | 2E + 6 (9%) | 3E + 7 (3%) |
Non-periodic | 1E + 5 (1%) | 4E + 5 (1%) | 4E + 5 (2%) | 1E + 5 (0%) | 6E + 7 (7%) |
Usable | 1E + 6 (21%) | 1E + 7 (31%) | 1E + 7 (59%) | 4E + 6 (14%) | 4E + 7 (5%) |
‘Raw data’ gives the total number of reads in the dataset. ‘Poor quality’ reads are either too short after removing adapters or do not have adequate fastq quality scores. ‘Ribosomal’ reads map to known ribosomal sequences. ‘No alignment’ reads do not align to the genome. ‘Multimappers’ map to the genome in multiple locations. ‘Non-periodic’ reads are of lengths whose metagene profiles do not result in a periodic signal. ‘Usable’ reads are kept for further analysis. The detailed counts for all samples, including all C. elegans replicates, are given in Supplementary File 7. We obtain a much higher percentage of rRNA reads from dauer stage lysates than from lysates of other developmental stages of the C. elegans life cycle (2).