Table 1.
Length and kind of Illumina sequencing reads | 36-base GA-I end sequences | 36-base GA-I shotgun sequences | 76-base GA-II end sequences |
---|---|---|---|
Aggregate length of targeta | 2.5 Mb | 2.5 Mb | 2.5 Mb |
Aggregate length of baits | 3.7 Mb | 3.7 Mb | 3.7 Mb |
| |||
Total raw unfiltered sequence | 152 Mb | 219 Mbb | 851 Mb |
Raw sequence not aligned uniquely to genomec | 67 Mb | 116 Mb | 358 Mb |
Uniquely aligned human sequence | 85 Mb | 102 Mb | 492 Mb |
| |||
Uniquely aligned sequence on target | 36 Mb | 51 Mb | 235 Mb |
Uniquely aligned sequence near targetd | 40 Mb | 38 Mb | 210 Mb |
Uniquely aligned sequence on or near target | 76 Mb | 90 Mb | 445 Mb |
| |||
Fraction of uniquely aligned sequence on or near targete | 89% | 88% | 90% |
Fraction of raw bases uniquely aligned on or near targetf | 50% | 41%g | 52% |
Fraction of uniquely aligned bases on targeth | 42% | 50% | 48% |
Protein-coding exon sequence only
Each unit of concatenated catch contains 44–46 bases (~18%) of generic adapter sequence. Therefore, ~18% (39 Mb) of the 219 Mb is not of human origin.
All raw sequence that fails to align uniquely to the human reference genome including low-quality sequence
Outside but within 500 bp of a target exon
Upper bound for estimating the specificity of hybrid selection
Lower bound for estimating the specificity of hybrid selection
The denominator (219 Mb) includes ~39 Mb of sequence from the generic adapters. Excluding these 39 Mb, the lower bound for the estimated specificity of this catch is 90/180 = 50%.
Upper bound for the overall specificity of targeted exon sequencing