Table 1.
Data remaining at the end of step | ||||
---|---|---|---|---|
Filtering step | Disease/normal pair | Family trio BH1019 | Family trio BH2041 | Family trio BH2688 |
Number of reads from proband/diseased tissue | 118,414,556 | 84,201,820 | 75,877,750 | 103,527,644 |
Number of 27-mers in proband/diseased tissue | 911,738,627 | 795,477,167 | 517,272,851 | 1,088,610,020 |
Number of k-mers with count >10 | 77,903,885 | 61,805,320 | 64,719,150 | 113,066,951 |
Remove vector sequence | 77,898,848 | 61,800,798 | 64,713,995 | 113,062,417 |
Eliminate k-mers found in reference GRC37 exome | 17,821,359 | 9,385,347 | 10,730,208 | 50,535,681 |
Eliminate k-mers found in parent exomes/normal tissue | 10,568 | 65,352 | 20,130 | 2,006 |
Identify reads containing k-mers | 32,829 reads | 148,496 | 46,454 | 4,404 |
Remove reads containing vector | 15,260 | 125,648 | 38,799 | 2,760 |
Number of contigs after assembly | 2,147 | 13,189 | 3,755 | 359 |
Number of contigs with >3 reads after merging contigs | 279 contigs | 1,437 | 701 | 71 |
Identify variants covered by reads from normal tissue | 55 contigs | 5 | 6 | 2 |
Keep variants with >5% coverage | 42 variants | 5 | 6 | 2 |
Find variants in coding regions | 14 variants | 3 | 3 | 1 |
Remove synonymous SNPs | 10 variants | 2 | 3 | 1 |