Table 4.
Taxonomic distribution of unigene blastx hits in the nr database
| Best blastx hit | Lowest common ancestor for blastx hits | |||
|---|---|---|---|---|
| Taxonomic category | Number of unigenes | Percent of unigenes with hits | Number of unigenes | Percent of Unigenes with hits |
| Eukaryotes | 33,776 | 97.2% | 32,059 | 92.3% |
| Green plants | 33,406 | 96.2% | 31,373 | 90.3% |
| "Green algae" | 175 | 0.5% | 78 | 0.2% |
| Land plants | 33,231 | 95.7% | 30,822 | 88.7% |
| "Bryophytes" | 394 | 1.1% | 2,197 | 6.3% |
| Vascular plants | 32,837 | 94.5% | 16,731 | 48.2% |
| Lycophytes | 74 | 0.2% | 13 | 0.0% |
| Ferns | 928 | 2.7% | 435 | 1.3% |
| Seed plants | 31,835 | 91.6% | 16,015 | 46.1% |
| Gymnosperms | 8,000 | 23.0% | 866 | 2.5% |
| Angiosperms | 23,835 | 68.6% | 10,572 | 30.4% |
| Animals | 288 | 0.8% | 63 | 0.2% |
| Fungi | 0 | 0.0% | 4 | 0.0% |
| Other eukaryotes | 77 | 0.2% | 12 | 0.0% |
| Bacteria | 22 | 0.1% | 91 | 0.3% |
| Artificial sequences, hits don't pass threshold, or taxon not assigned | 20 | 0.1% | 216 | 0.6% |
Unigenes were searched in the NCBI nr protien database using blastx with an e-value threshold of 1e-10, keeping the best ten hits. Of the 56,256 unigenes, 34,740 (61.8%) had a positive hit. The lowest common ancestor (LCA) assignment for a sequence was calculated using the LCA algorithm implemented in MEGAN v3.9 [61] based on at least three blastx hits with a bitscore greater than 75 and within 10% of the best bitscore. Note: the predicted proteins from Selaginella moellendorffii are not currently included in the nr database and thus are not reflected in these results.