Skip to main content
. 2007 Nov 26;2:32. doi: 10.1186/1745-6150-2-32

Table 2.

Statistics of datasets used in this study. The first column is the name of the organism, the second column – the number of protein-coding genes in its genome, Ngenes, the third column – the number of proteins for which we found at least one paralogous partner, the fourth column is the percentage of proteins with at least one paralog, the fifth column – the total number of distinct BLAST hits generated before we applied subsequent filtering, the sixth column – the number of paralogous pairs included in Na(p), and the seventh column – in Nd(p).

Organism Proteome size Number of proteins with paralogs % of proteins with paralogs BLASTP hits Number of pairs in Na(p) Number of pairs in Nd(p)
H. pylori 1590 230 14% 3228 260 148
E. coli 4288 1428 33% 16768 2614 1013
S. cerevisiae 5885 1689 29% 43915 2297 1025
C. elegans 19099 6894 36% 204398 46463 5545
D. melanogaster 14015 4153 30% 557047 17621 3238
H. sapiens 25319 9252 37% 1330721 31078 6595