Skip to main content
. 2018 Jun 27;34(13):i295–i303. doi: 10.1093/bioinformatics/bty287

Table 4.

Performance of the MCL algorithm in super-family and family clustering for all protein similarity computation methods with Precision, Recall and F-measure values

Super-family
Family
No. Clusters Precision Recall F-measure No. Clusters Precision Recall F-measure
Protein sequence based
Blast (e-value) A-50 728 0.792 0.271 0.290 728 0.687 0.406 0.379
Blast (identity) A-50 783 0.882 0.496 0.540 783 0.803 0.622 0.592
Protein Word frequency A-50 411 0.769 0.625 0.590 411 0.643 0.767 0.606
ProtVec avg (word) A-50 1001 0.964 0.514 0.596 1001 0.909 0.639 0.665
ProtVec avg (char) A-50 1017 0.964 0.508 0.590 1017 0.910 0.633 0.662
ProtVec MinMax (word) A-50 1014 0.964 0.508 0.590 1014 0.909 0.634 0.662
Ligand based
SMILES Word frequency A-50 312 0630 0.550 0.470 312 0.497 0.686 0.475
SMILESVec (word, chembl) A-50 867 0.937 0.544 0.608 867 0.870 0.672 0.667
SMILESVec (word, pubchem) A-50 857 0.931 0.544 0.604 857 0.861 0.673 0.664
SMILESVec (word, combined) A-50 894 0.940 0.540 0.607 894 0.877 0.666 0.668
SMILESVec (char, chembl) A-50 999 0.962 0.514 0.596 999 0.908 0.641 0.668
SMILESVec (char, pubchem) A-50 977 0.958 0.514 0.595 977 0.900 0.643 0.667
SMILESVec (char, combined) A-50 1006 0.963 0.514 0.595 1006 0.909 0.641 0.669
MACCS A-50 874 0.936 0.540 0.606 874 0.866 0.668 0.667
ECFP6 A-50 618 0.863 0.582 0.599 618 0.762 0.710 0.631

Note: The best F-measure values for the Protein sequence- and ligand-based methods are shown in bold.