Skip to main content
. Author manuscript; available in PMC: 2009 Aug 1.
Published in final edited form as: Nat Biotechnol. 2009 Feb 1;27(2):182–189. doi: 10.1038/nbt.1523

Table 1.

Detailed breakdown of Illumina sequences generated from exon catches

Length and kind of Illumina sequencing reads 36-base GA-I end sequences 36-base GA-I shotgun sequences 76-base GA-II end sequences
Aggregate length of targeta 2.5 Mb 2.5 Mb 2.5 Mb
Aggregate length of baits 3.7 Mb 3.7 Mb 3.7 Mb

Total raw unfiltered sequence 152 Mb 219 Mbb 851 Mb
Raw sequence not aligned uniquely to genomec 67 Mb 116 Mb 358 Mb
Uniquely aligned human sequence 85 Mb 102 Mb 492 Mb

Uniquely aligned sequence on target 36 Mb 51 Mb 235 Mb
Uniquely aligned sequence near targetd 40 Mb 38 Mb 210 Mb
Uniquely aligned sequence on or near target 76 Mb 90 Mb 445 Mb

Fraction of uniquely aligned sequence on or near targete 89% 88% 90%
Fraction of raw bases uniquely aligned on or near targetf 50% 41%g 52%
Fraction of uniquely aligned bases on targeth 42% 50% 48%
a

Protein-coding exon sequence only

b

Each unit of concatenated catch contains 44–46 bases (~18%) of generic adapter sequence. Therefore, ~18% (39 Mb) of the 219 Mb is not of human origin.

c

All raw sequence that fails to align uniquely to the human reference genome including low-quality sequence

d

Outside but within 500 bp of a target exon

e

Upper bound for estimating the specificity of hybrid selection

f

Lower bound for estimating the specificity of hybrid selection

g

The denominator (219 Mb) includes ~39 Mb of sequence from the generic adapters. Excluding these 39 Mb, the lower bound for the estimated specificity of this catch is 90/180 = 50%.

h

Upper bound for the overall specificity of targeted exon sequencing