. 2019 Aug 30;9:12603. doi: 10.1038/s41598-019-48913-8

Table 4.

Classification performance of our RF models, PRODIGY-CRYSTAL, and EPPIC based on Uniref100 (2016_07).

Performance on the large-scale dataset
	Sensitivity	Specificity	Accuracy	MCC
RF model^*1	90%	93%	91%	0.82
RF model^*2	92% (92%)	94% (94%)	93% (93%)	0.86 (0.86)
RF model^*3	95%	96%	95%	0.91
PRODIGY-CRYSTAL^*4	91%	94%	92%	0.85
EPPIC	90%	88%	89%	0.78

*1: The model was trained on the Duarte dataset, which is the same model described in Table 2 exploiting CS features. *2: The model was trained on the merged datasets (the Duarte, Bahadur, and Zhu datasets were merged). Numbers in parentheses represent performance on the dataset having no overlapped entries, where 35 overlapped entries between the merged and the large datasets were removed. *3: The model was trained and evaluated using the large-scale dataset by applying 10-fold cross-validation. *4: Because the classifier was trained on the same dataset, 10-fold cross-validation was conducted. For our RF model and PRODIGY-CRYSTAL, the same partitioning was applied for each fold.