Table 1.
Global species and sequence content in PhEVER
| No. of species | No. of proteinsa | No. of proteins in fam.b | Data source | |
|---|---|---|---|---|
| All | 3476 | 4 515 271 | 333 618 (1%) | – |
| Viruses | 2426 | 82 929 (2718 m.p.c) | 82 784 (100%) | RefSeq |
| Bacteria | 937 | 3 207 914 | 232 066 (7%) | Genome Reviews |
| Archaea | 70 | 158 919 | 6702 (4%) | Genome Reviews |
| Eukarya | 43 | 1 065 509 | 12 066 (1%) | Ensembl + Genome Reviews |
| Eukarya (excl. Anopheles)d | – | 580 567 | 11 201 (2%) | – |
aProteins correspond to translated annotated CDS. The number of proteins in the protein database is therefore equal to the number of CDS in the nucleic database.
bNumber of proteins associated to a family, followed by the proportion of proteins associated to a family in the taxonomic group.
c2718 mature peptides are added to the 80 210 proteins translated from all CDS.
dThe genome of A. gambiae contains data from different haplotypes and presents therefore a high level of redundancy.