TABLE 1.
Target | Method | Query (GSS data set) | Result of query | |||
---|---|---|---|---|---|---|
Database or sequence | Date posted | No. of sequences | No. of letters | |||
GenBank Alveolata | 8 August 2000 | 36,628 | 29,939,175 | BLASTN 2.0.14 | hq | 3 rRNA genes; 6 protein coding genes |
Swall | BLASTX 2.0.12 | nr | 722 ORFs | |||
Swissprot | 7 July 2000 | 86,593 | 31,411,157 | |||
Swissprotnew | 8 July 2000 | 2,822 | 1,280,954 | |||
Sptrembl | 28 June 2000 | 297,973 | 93,374,136 | |||
Sptrnew | 8 July 2000 | 71,201 | 22,367,698 | |||
Pfam | 21 September 2000 | 111,934 | 38,089,901 | BLASTP 2.0.14 | ORF | 524 PfamA domains |
Proteome | BLASTP 2.0.14 | ORF | ||||
E. coli | 4 January 2001 | 4,581 | 1,435,304 | 101 hitsb | ||
S. cerevisiae | 15 November 2000 | 6,358 | 2,991,939 | 505 hitsb | ||
C. elegans | 15 November 2000 | 19,704 | 8,596,400 | 586 hitsb | ||
D. melanogaster | 15 November 2000 | 14,080 | 6,850,524 | 599 hitsb | ||
A. thaliana | 6 January 2001 | 25,458 | 11,049,032 | 588 hitsb | ||
H. sapiens | 15 November 2000 | 31,919 | 13,433,624 | 616 hitsb | ||
hq | 2 November 2000 | 3,046 | 1,403,056 | TBLASTX 2.0.14 | hq | 108 families |
The genome survey sequence (GSS) data set used for each query is as follows: nr, nonredundant data (3,139 reads; 3,481,394 nt); hq, high-quality nonredundant data (3,046 reads, 1,403,056 nt); and ORF, set of 722 ORF fragments identified by similarity (107,303 aa).
Threshold, E < 10−4.