Table 1. Performance evaluation of the linear SVMs across five PTM types. The number of true positive sites used in the 10-fold cross-validation is about half the amount of data present in the PSP database after removal of redundant protein sequences and those that do not have secondary structure information from SPIDER3. AUC – area-under-the-curve; MCC – the highest Matthew's correlation coefficient at all score thresholds; sensitivity/specificity at score threshold corresponding to the highest MCC value.
PTM type | No. of proteins | No. of PSP sites | Window size 25 |
|||
AUC | MCC | Sensitivity | Specificity | |||
Acetylation (K) | 3729 | 10 479 | 0.66 | 0.25 | 0.61 | 0.64 |
Methylation (K) | 1521 | 2566 | 0.74 | 0.39 | 0.61 | 0.76 |
Ubiquitination (K) | 4874 | 22 592 | 0.64 | 0.22 | 0.67 | 0.54 |
SUMOylation (K) | 1020 | 2996 | 0.77 | 0.42 | 0.63 | 0.79 |
Methylation (R) | 2301 | 5450 | 0.79 | 0.47 | 0.62 | 0.84 |
Phosphorylation (S) | 8510 | 76 008 | 0.74 | 0.36 | 0.70 | 0.66 |
Phosphorylation (T) | 6982 | 28 359 | 0.72 | 0.33 | 0.66 | 0.66 |
Phosphorylation (Y) | 6097 | 18 645 | 0.70 | 0.30 | 0.72 | 0.58 |