. 2015 Sep 14;2015:292950. doi: 10.1155/2015/292950

Table 3.

Sequence output and data storage for the two datasets. The number of sequences surviving the common preprocessing stages are shown, whilst classified sequences are based on the targeted then assembly approach within the viral dataset, and the kmer based approach within the nonhuman model dataset. Percentages based on the expected number of PE sequences generated for each sequencing chemistry kit used. Storage (in GB) consists of all fastq and intermediate files including bam and bed format files, generated throughout the analysis.

Sample	Dataset 1: viral panel			Dataset 2: nonhuman model
Sample	Reads within set	%	Data (GB)	Reads within set	%	Data (GB)
Predicted reads	15,000,000	—	—	25,000000	—	—
Sequenced reads	13,537,917	90.3	9.1	12,734,165	50.9	13.6
Preprocessing: trimming	12,223,513	81.5	15.8	11,520,499	46.1	24.5
Preprocessing: host screen	11,265,758	75.1	15.8	11,517,217	46.1	24.5
Classified sequences	8,006,562	53.4	7.3	2,788,450	11.2	5.5

Total storage			32.2			43.6