. Author manuscript; available in PMC: 2009 Aug 1.

Published in final edited form as: Nat Biotechnol. 2009 Feb 1;27(2):182–189. doi: 10.1038/nbt.1523

Table 1.

Detailed breakdown of Illumina sequences generated from exon catches

Length and kind of Illumina sequencing reads	36-base GA-I end sequences	36-base GA-I shotgun sequences	76-base GA-II end sequences
Aggregate length of target^a	2.5 Mb	2.5 Mb	2.5 Mb
Aggregate length of baits	3.7 Mb	3.7 Mb	3.7 Mb

Total raw unfiltered sequence	152 Mb	219 Mb^b	851 Mb
Raw sequence not aligned uniquely to genome^c	67 Mb	116 Mb	358 Mb
Uniquely aligned human sequence	85 Mb	102 Mb	492 Mb

Uniquely aligned sequence on target	36 Mb	51 Mb	235 Mb
Uniquely aligned sequence near target^d	40 Mb	38 Mb	210 Mb
Uniquely aligned sequence on or near target	76 Mb	90 Mb	445 Mb

Fraction of uniquely aligned sequence on or near target^e	89%	88%	90%
Fraction of raw bases uniquely aligned on or near target^f	50%	41%^g	52%
Fraction of uniquely aligned bases on target^h	42%	50%	48%

Protein-coding exon sequence only

Each unit of concatenated catch contains 44–46 bases (~18%) of generic adapter sequence. Therefore, ~18% (39 Mb) of the 219 Mb is not of human origin.

All raw sequence that fails to align uniquely to the human reference genome including low-quality sequence

Outside but within 500 bp of a target exon

Upper bound for estimating the specificity of hybrid selection

Lower bound for estimating the specificity of hybrid selection

The denominator (219 Mb) includes ~39 Mb of sequence from the generic adapters. Excluding these 39 Mb, the lower bound for the estimated specificity of this catch is 90/180 = 50%.

Upper bound for the overall specificity of targeted exon sequencing