Table 4.
Super-family |
Family |
||||||||
---|---|---|---|---|---|---|---|---|---|
No. Clusters | Precision | Recall | F-measure | No. Clusters | Precision | Recall | F-measure | ||
Protein sequence based | |||||||||
Blast (e-value) | A-50 | 728 | 0.792 | 0.271 | 0.290 | 728 | 0.687 | 0.406 | 0.379 |
Blast (identity) | A-50 | 783 | 0.882 | 0.496 | 0.540 | 783 | 0.803 | 0.622 | 0.592 |
Protein Word frequency | A-50 | 411 | 0.769 | 0.625 | 0.590 | 411 | 0.643 | 0.767 | 0.606 |
ProtVec avg (word) | A-50 | 1001 | 0.964 | 0.514 | 0.596 | 1001 | 0.909 | 0.639 | 0.665 |
ProtVec avg (char) | A-50 | 1017 | 0.964 | 0.508 | 0.590 | 1017 | 0.910 | 0.633 | 0.662 |
ProtVec MinMax (word) | A-50 | 1014 | 0.964 | 0.508 | 0.590 | 1014 | 0.909 | 0.634 | 0.662 |
Ligand based | |||||||||
SMILES Word frequency | A-50 | 312 | 0630 | 0.550 | 0.470 | 312 | 0.497 | 0.686 | 0.475 |
SMILESVec (word, chembl) | A-50 | 867 | 0.937 | 0.544 | 0.608 | 867 | 0.870 | 0.672 | 0.667 |
SMILESVec (word, pubchem) | A-50 | 857 | 0.931 | 0.544 | 0.604 | 857 | 0.861 | 0.673 | 0.664 |
SMILESVec (word, combined) | A-50 | 894 | 0.940 | 0.540 | 0.607 | 894 | 0.877 | 0.666 | 0.668 |
SMILESVec (char, chembl) | A-50 | 999 | 0.962 | 0.514 | 0.596 | 999 | 0.908 | 0.641 | 0.668 |
SMILESVec (char, pubchem) | A-50 | 977 | 0.958 | 0.514 | 0.595 | 977 | 0.900 | 0.643 | 0.667 |
SMILESVec (char, combined) | A-50 | 1006 | 0.963 | 0.514 | 0.595 | 1006 | 0.909 | 0.641 | 0.669 |
MACCS | A-50 | 874 | 0.936 | 0.540 | 0.606 | 874 | 0.866 | 0.668 | 0.667 |
ECFP6 | A-50 | 618 | 0.863 | 0.582 | 0.599 | 618 | 0.762 | 0.710 | 0.631 |
Note: The best F-measure values for the Protein sequence- and ligand-based methods are shown in bold.