Skip to main content
. 2015 Sep 14;2015:292950. doi: 10.1155/2015/292950

Table 3.

Sequence output and data storage for the two datasets. The number of sequences surviving the common preprocessing stages are shown, whilst classified sequences are based on the targeted then assembly approach within the viral dataset, and the kmer based approach within the nonhuman model dataset. Percentages based on the expected number of PE sequences generated for each sequencing chemistry kit used. Storage (in GB) consists of all fastq and intermediate files including bam and bed format files, generated throughout the analysis.

Sample Dataset 1: viral panel Dataset 2: nonhuman model
Reads within set % Data (GB) Reads within set % Data (GB)
Predicted reads 15,000,000 25,000000
Sequenced reads 13,537,917 90.3 9.1 12,734,165 50.9 13.6
Preprocessing: trimming 12,223,513 81.5 15.8 11,520,499 46.1 24.5
Preprocessing: host screen 11,265,758 75.1 11,517,217 46.1
Classified sequences 8,006,562 53.4 7.3 2,788,450 11.2 5.5

Total storage 32.2 43.6