Table 3.
The average performance of A-cell epitope prediction models on training and independent dataset
Features | Threshold | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC | Parameters |
---|---|---|---|---|---|---|
Internal validation: performance on training dataset, evaluated using fivefold cross-validation | ||||||
DPC | − 0.2 | 87.49 ± 1.41 | 98.70 ± 0.16 | 97.68 ± 0.22 | 0.86 ± 0.01 | g: 0.0005, c: 1, j: 4 |
DPC + Motif | − 0.2 | 87.81 ± 1.01 | 99.30 ± 0.10 | 98.25 ± 0.17 | 0.89 ± 0.01 | g: 0.0005, c: 1, j: 4 |
External validation: performance on independent dataset | ||||||
DPC | − 0.2 | 87.54 ± 4.31 | 98.87 ± 0.28 | 97.84 ± 0.41 | 0.87 ± 0.02 | g: 0.0005, c: 1, j: 4 |
DPC + Motif | − 0.2 | 77.86 ± 5.84 | 99.28 ± 0.30 | 97.33 ± 0.58 | 0.83 ± 0.04 | g: 0.0005, c: 1, j: 4 |
These training and independent datasets were created from alternate datasets using bagging. In alternate dataset, negative or non-epitopes were derived from human proteins. The performance values have been reported as mean ± standard deviation for each model
MCC Matthews correlation coefficient, DPC dipeptide composition, DPC + motif dipeptide composition with MERCI motif score, SVM parameters g gamma parameter of the radial basis function, c trade-off between training error and margin, j regularization parameter (cost-factor, by which training errors on positive examples outweigh errors on negative examples)