Table 1. Comparison of search strategies for H3 histone sequences.
reference H3 set | uniquegi |
H3 |
success | efficiency |
---|---|---|---|---|
1742 | 1742 | |||
ENTREZ “eukaryota[ORGN]” | 1143461 | 1742 | 100.0% | 0.2% |
ENTREZ “H3” | 3303 | 1452 | 83.4% | 44.0% |
ENTREZ “histon” | 9297 | 1653 | 94.9% | 17.8% |
ENTREZ “eukaryota[ORGN] and H3” | 2703 | 1452 | 83.4% | 53.7% |
ENTREZ “eukaryota[ORGN] and histon” | 7453 | 1653 | 94.9% | 22.2% |
BLASTPGP H3human | 1747 | 1719 | 98.7% | 98.4% |
BLASTPGP H3human+seg | 1747 | 1719 | 98.7% | 98.4% |
BLASTPGP H3human+eukgi | 1754 | 1722 | 98.9% | 98.2% |
BLASTPGP H3human+eukgi+seg | 1754 | 1722 | 98.9% | 98.2% |
BLASTPGP H3yeast | 1777 | 1718 | 98.6% | 96.7% |
BLASTPGP H3yeast+seg | 1777 | 1718 | 98.6% | 96.7% |
BLASTPGP H3yeast+eukgi | 1780 | 1718 | 98.6% | 96.5% |
BLASTPGP H3yeast+eukgi+seg | 1780 | 1718 | 98.6% | 96.5% |
PSIBLASTPGP H3human | 1897 | 1726 | 99.1% | 91.0% |
PSIBLASTPGP H3human+seg | 1897 | 1726 | 99.1% | 91.0% |
PSIBLASTPGP H3human+eukgi | 1949 | 1727 | 99.1% | 88.6% |
PSIBLASTPGP H3human+eukgi+seg | 1949 | 1727 | 99.1% | 88.6% |
PSIBLASTPGP H3yeast | 2011 | 1726 | 99.1% | 85.8% |
PSIBLASTPGP H3yeast+seg | 2011 | 1726 | 99.1% | 85.8% |
PSIBLASTPGP H3yeast+eukgi | 2077 | 1727 | 99.1% | 83.1% |
PSIBLASTPGP H3yeast+eukgi+seg | 2077 | 1727 | 99.1% | 83.1% |
WINBLASTPGP H3human | 69678 | 1730 | 99.3% | 2.5% |
WINBLASTPGP H3human+eukgi | 60821 | 1732 | 99.4% | 2.8% |
WINBLASTPGP H3human+eukgi+seg | 1697 | 1646 | 94.5% | 97.0% |
WINBLASTPGP H3yeast | 70864 | 1730 | 99.3% | 2.4% |
WINBLASTPGP H3yeast+eukgi | 63949 | 1730 | 99.3% | 2.7% |
WINBLASTPGP H3yeast+eukgi+seg | 1788 | 1646 | 94.5% | 92.1% |
BLASTPGP = gapped protein blast; PSIBLASTPGP = interated gapped protein blast using profiles; WINBLASTPGP = gapped protein BLAST for short, nearly exact matches, using sequence windows as queries; eukgi = search restricted to sequences from eukaryotes; seg = SEG filtering of low-complexity regions enabled. All results were compared to a curated reference_H3_set of sequences. Column headers: uniq gi = number of unique sequence records retrieved; H3 = number of retrieved unique gis shared with the reference set; efficiency = percent H3/uniq gi; success = percent H3/reference set.