Skip to main content
. 2001 Sep;11(9):1594–1602. doi: 10.1101/gr.186901

Table 1.

Comparison of Normalized Libraries

Library MPMGp609 1NIB 1NFLS




Average insert size (kb) 1.9 1.5 0.9
Stringent protein hits (5′ reads) 42.6% 53.6% 15.2%
Coding Prediction 5′ reads 74.8% 43.8% 33.7%
Coding Prediction for 5′ reads with stringent protein hits 93.5% 70.3% 53.8%
Estimate of clones containing initiating methionine 33.9% 17.5% 11.6%
3′ ESTs with consensus poly-adenylation signal 76.0% 77.6% 47.6%
Sequence diversity (3′ clusters from 10,600 3′ reads) 68% 69% 80%

We compared the three normalized libraries: MPMGp609, 1NIB, and 1NFLS. Average insert size (based on at PCR products from at least 96 randomly-chosen clones) and stringent protein hits (WU-BLASTX score e ≥ −30) are indicative of which transcript fragments are being examined (e.g., if the 5′ end of the insert falls in UTR, the protein coding region, or an artifactual unspliced non-coding region.) We searched for rarer, novel, or diverged transcripts with the gene prediction programs GenScan (Burge and Karlin 1997) and MZEF (Zhang 1997); a hit with either program was seen as indicative. The efficacy of these gene prediction programs on ESTs was tested using 5′ ESTs with stringent protein hits (i.e., known to be coding). The MPMGp609 library performed better than the other libraries, presumably because a zebrafish EST must have a much longer alignment with a mammalian protein than a human EST to produce a given score (here e-30) and thus contains a longer piece of coding sequence that is more likely to be detected. As a further quality measure we examined the number of 5′ ESTs with stringent protein hits, which matched the first methionine of their protein sequence hit. We also examined the proportion of 3′ ESTs containing the consensus poly-adenylation signals (AAUAAA or AUUAAA) present in 80% of eukaryotic 3′ UTRs (Pesole et al. 2000). To measure library complexity we clustered 10,600 3′ ESTs with at least 200 high-quality bases (as demarked in their dbEST entry) with PHRAP. The number of 3′ clusters (counting singletons as clusters) was taken as a measure of the number of transcripts identified.