Skip to main content
. 2013 Dec 25;35(3):283–288. doi: 10.1002/humu.22503

Table 1.

Illustration of the Data Reduction at Each Step from Raw Reads to a Final Set of Mutated Loci

Data remaining at the end of step
Filtering step Disease/normal pair Family trio BH1019 Family trio BH2041 Family trio BH2688
Number of reads from proband/diseased tissue 118,414,556 84,201,820 75,877,750 103,527,644
Number of 27-mers in proband/diseased tissue 911,738,627 795,477,167 517,272,851 1,088,610,020
Number of k-mers with count >10 77,903,885 61,805,320 64,719,150 113,066,951
Remove vector sequence 77,898,848 61,800,798 64,713,995 113,062,417
Eliminate k-mers found in reference GRC37 exome 17,821,359 9,385,347 10,730,208 50,535,681
Eliminate k-mers found in parent exomes/normal tissue 10,568 65,352 20,130 2,006
Identify reads containing k-mers 32,829 reads 148,496 46,454 4,404
Remove reads containing vector 15,260 125,648 38,799 2,760
Number of contigs after assembly 2,147 13,189 3,755 359
Number of contigs with >3 reads after merging contigs 279 contigs 1,437 701 71
Identify variants covered by reads from normal tissue 55 contigs 5 6 2
Keep variants with >5% coverage 42 variants 5 6 2
Find variants in coding regions 14 variants 3 3 1
Remove synonymous SNPs 10 variants 2 3 1