Skip to main content
. 2018 Jun 27;34(13):i295–i303. doi: 10.1093/bioinformatics/bty287

Table 3.

Performance of the TransClust algorithm in super-family and family clustering for all protein similarity computation methods with Precision, Recall and F-measure values

Super-family
Family
No. Clusters Precision Recall F-measure No. Clusters Precision Recall F-measure
Protein sequence based
Blast (e-value) A-50 1596 0.997 0.261 0.350 1636 1.0 0.399 0.500
Blast (identity) A-50 606 0.861 0.550 0.595 660 0.781 0.668 0.631
Protein Word frequency A-50 708 0.952 0.621 0.686 688 0.844 0.777 0.744
ProtVec Avg (word) A-50 655 0.927 0.620 0.681 704 0.845 0.757 0.739
ProtVec Avg (char) A-50 707 0.940 0.603 0.674 707 0.842 0.746 0.729
ProtVec MinMax (word) A-50 586 0.891 0.623 0.667 704 0.829 0.741 0.718
Ligand based
SMILES Word frequency A-50 801 0.951 0.548 0.624 957 0.934 0.658 0.704
SMILESVec (word, chembl) A-50 621 0.921 0.621 0.677 730 0.855 0.744 0.735
SMILESVec (word, pubchem) A-50 573 0.888 0.627 0.668 692 0.839 0.751 0.730
SMILESVec (word, combined) A-50 617 0.923 0.627 0.675 764 0.873 0.732 0.735
SMILESVec (char, chembl) A-50 636 0.920 0.621 0.678 710 0.844 0.743 0.729
SMILESVec (char, pubchem) A-50 714 0.941 0.600 0.671 715 0.845 0.744 0.729
SMILESVec (char, combined) A-50 712 0.949 0.602 0.675 712 0.850 0.749 0.739
MACCS A-50 589 0.909 0.629 0.679 683 0.839 0.757 0.736
ECFP6 A-50 611 0.917 0.627 0.679 725 0.860 0.746 0.733

Note: The best F-measure values for the Protein sequence- and Ligand-based methods are shown in bold.