Table 3.
Specificity of SVM | Positive training points | Kernel type | Leave-one-out cross-validation | Quality of SVM | |||
---|---|---|---|---|---|---|---|
Error | Sn | Sp | MCC | ||||
Large clusters | 282 Labeled and 664 unlabeled data points (18 + 646) | ||||||
Dhb=Sal | 11 | l | 0.4 | 100 | 92 | 96 | ++ |
Asp=Asn=Glu=Gln=Aad | 43 | r | 1.4 | 100 | 91 | 95 | ++ |
Pro=Pip | 20 | r | 0.7 | 90 | 100 | 95 | ++ |
Cys | 17 | r | 0.7 | 100 | 89 | 94 | ++ |
Ser=Thr=Dhpg=Dpg=Hpg | 50 | r | 2.5 | 96 | 91 | 92 | ++ |
Gly=Ala=Val=Leu=Ile=Abu=Iva | 92 | r | 4.3 | 95 | 93 | 90 | + |
Orn=Lys=Arg | 16 | l | 0.7 | 88 | 88 | 87 | + |
Phe=Trp=Phg=Tyr=Bht | 33 | r | 3.2 | 88 | 85 | 85 | 0 |
Small clusters | 273 Labeled and 673 unlabeled data points (27 + 646) | ||||||
Dhb=Sal | 11 | l | 0 | 100 | 100 | 100 | ++ |
Aad | 7 | l | 0 | 100 | 100 | 100 | ++ |
Glu=Gln | 15 | l | 0 | 100 | 100 | 100 | ++ |
Dhpg=Dpg=Hpg | 20 | l | 0.4 | 100 | 95 | 97 | ++ |
Ser | 13 | l | 0.4 | 92 | 100 | 96 | ++ |
Cys | 17 | l | 0.7 | 100 | 89 | 94 | ++ |
Thr | 16 | l | 0.7 | 94 | 94 | 93 | ++ |
Pro | 16 | r | 0.7 | 94 | 94 | 93 | ++ |
Asp=Asn | 21 | l | 1.1 | 90 | 95 | 92 | ++ |
Val=Leu=Ile=Abu=Iva | 60 | l | 2.9 | 92 | 95 | 91 | + |
Orn | 8 | l | 0.7 | 88 | 88 | 87 | + |
Gly=Ala | 32 | l | 3.3 | 81 | 90 | 84 | 0 |
Tyr | 18 | r | 2.2 | 94 | 77 | 84 | 0 |
Arg | 5 | l | 0.7 | 80 | 80 | 80 | 0 |
Phe=Trp | 14 | l | 3.7 | 57 | 67 | 60 | 0 |
The more training data that are available the more reliable the trained predictive models are. The ‘quality of SVM’ in the last column, therefore, is a qualitative measure for the MCC. Kernel type l stands for linear kernel and r stands for radial basis function kernel. Error rate, sensitivity (Sn), specificity (Sp) and Mathews correlation coefficient (MCC) are given in percentage.