Skip to main content
. 2022 Apr 5;5:316. doi: 10.1038/s42003-022-03261-8

Table 1.

Proteomes and structure models considered.

Species Common name Reference proteome # unique UniProt IDs # original # domains # structure predictions with no domains (1D)
Arabidopsis thaliana Arabidopsis UP000006548 27,434 27,434 37,682 5722
Caenorhabditis elegans Nematode worm UP000001940 19,694 19,694 26,160 4277
Candida albicans C. albicans UP000000559 5974 5,974 9,978 743
Danio rerio Zebrafish UP000000437 24,664 24,664 42,135 2530
Dictyostelium discoideum Dictyostelium UP000002195 12,622 12,622 18,963 2986
Drosophila melanogaster Fruit fly UP000000803 13,458 13,458 19,881 2335
Escherichia coli E. coli UP000000625 4363 4363 5397 417
Glycine max Soybean UP000008827 55,799 55,799 72,217 14,146
Homo sapiens Human UP000005640 20,504 23,391 44,827 3302
Leishmania infantum L. infantum UP000008153 7924 7924 12,257 1579
Methanocaldococcus jannaschii M. jannaschii UP000000805 1,773 1,773 2,097 131
Mus musculus Mouse UP000000589 21,615 21,615 35,216 2477
Mycobacterium tuberculosis M. tuberculosis UP000001584 3988 3988 5170 351
Oryza sativa Asian rice UP000059680 43,649 43,649 39,775 19,756
Plasmodium falciparum P. falciparum UP000001450 5187 5187 7283 1162
Rattus norvegicus Rat UP000002494 21,272 21,272 33,818 2664
Saccharomyces cerevisiae Budding yeast UP000002311 6040 6040 9837 967
Schizosaccharomyces pombe Fission yeast UP000002485 5128 5128 8173 637
Staphylococcus aureus S. aureus UP000008816 2888 2888 3283 415
Trypanosoma cruzi T. cruzi UP000002296 19,036 19,036 26,205 5436
Zea mays Maize UP000007305 39,299 39,299 48,433 11,582

For each proteome, the number of unique proteins, total original/domain models, and total original models containing no confident domains are given. The definition of the confident domains is given in the main text. The human original model count is underlined, indicating that the number of original models does not match the number of unique proteins. The human structure predictions retrieved from the AlphaFold Database contain models which are 1400-residue slices of larger proteins.