Skip to main content
. 2021 Feb 12;26(4):966. doi: 10.3390/molecules26040966

Figure 4.

Figure 4

MCC as a function of dataset size for 160 different sequence-based models. For each of the ten zinc-binding site families, 9 classifiers were trained using 20–100% of the original, unclustered data (10 × 9 models); additional classifiers were trained using sequences clustered at 40–100% sequence identity (10 × 7 models). The performance (MCC) is plotted against the size of the training dataset. The two modes of dataset reduction are shown by different shades and it can be seen that the curves are not significantly different. This suggests that homology between training and test sets does not influence a model’s performance; rather, performance is a function of training dataset size.