Table I.
Number of sites | AUC | sn at sp = 0.90 | sn at sp = 0.95 | sn at sp = 0.99 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Modification type | Residue | Positive | Negative | No PSSM | PSSM | No PSSM | PSSM | No PSSM | PSSM | No PSSM | PSSM |
Acetylation | K | 6,848 | 149,314 | 0.688 | 0.713 | 0.277 | 0.312 | 0.168 | 0.188 | 0.046 | 0.057 |
ADP-ribosylation | E, R | 108 | 4,681 | 0.739 | 0.753 | 0.356 | 0.369 | 0.221 | 0.236 | 0.063 | 0.062 |
Amidation | All | 457 | 29,966* | 0.964 | 0.967 | 0.923 | 0.930 | 0.851 | 0.866 | 0.570 | 0.615 |
C-linked glycosylation | W | 32 | 118 | 0.938 | 0.928 | 0.756 | 0.837 | 0.606 | 0.750 | 0.454 | 0.415 |
Carboxylation | E | 112 | 1,063 | 0.920 | 0.939 | 0.795 | 0.843 | 0.641 | 0.767 | 0.359 | 0.535 |
Disulfide linkage | C | 9,736 | 7,101 | 0.646 | 0.783 | 0.182 | 0.391 | 0.110 | 0.246 | 0.037 | 0.078 |
Farnesylation | C | 41 | 59* | 0.857 | 0.862 | 0.533 | 0.633 | 0.319 | 0.225 | 0.174 | 0.041 |
Geranylgeranylation | C | 30 | 43* | 0.866 | 0.919 | 0.571 | 0.687 | 0.393 | 0.596 | 0.230 | 0.534 |
GPI-anchor amidation | N | 84 | 2,362 | 0.961 | 0.966 | 0.908 | 0.905 | 0.841 | 0.853 | 0.518 | 0.528 |
Hydroxylation | K, P, Y | 219 | 4,209 | 0.832 | 0.907 | 0.535 | 0.732 | 0.388 | 0.586 | 0.109 | 0.253 |
Methylation | K, R | 628 | 18,561 | 0.660 | 0.674 | 0.319 | 0.349 | 0.243 | 0.264 | 0.130 | 0.127 |
Myristoylation | G | 99 | 119* | 0.792 | 0.852 | 0.353 | 0.514 | 0.175 | 0.354 | 0.038 | 0.008 |
N-linked glycosylation | N | 11,286 | 78,050 | 0.790 | 0.806 | 0.215 | 0.330 | 0.066 | 0.160 | 0.018 | 0.030 |
N-terminal acetylation | A, G, M, S, T | 1,310 | 2,002* | 0.821 | 0.836 | 0.471 | 0.503 | 0.310 | 0.331 | 0.093 | 0.106 |
O-linked glycosylation | S, T | 1,427 | 44,048 | 0.731 | 0.749 | 0.350 | 0.376 | 0.228 | 0.253 | 0.059 | 0.082 |
Palmitoylation | C | 245 | 1,298 | 0.856 | 0.881 | 0.625 | 0.679 | 0.467 | 0.525 | 0.192 | 0.244 |
Phosphorylation | S, T, Y | 90,058 | 320,506 | 0.771 | 0.777 | 0.422 | 0.437 | 0.296 | 0.312 | 0.113 | 0.116 |
Proteolytic cleavage | All | 997 | 257,783 | 0.727 | 0.759 | 0.379 | 0.420 | 0.264 | 0.291 | 0.085 | 0.102 |
PUPylation | K | 87 | 1,077 | 0.658 | 0.786 | 0.218 | 0.436 | 0.123 | 0.256 | 0.042 | 0.059 |
Pyrrolidone carb. acid | Q | 275 | 2,789 | 0.880 | 0.906 | 0.682 | 0.770 | 0.538 | 0.658 | 0.188 | 0.389 |
Sulfation | Y | 121 | 667 | 0.913 | 0.930 | 0.772 | 0.832 | 0.575 | 0.629 | 0.304 | 0.268 |
SUMOylation | K | 744 | 17,539 | 0.742 | 0.739 | 0.419 | 0.458 | 0.311 | 0.360 | 0.135 | 0.172 |
Ubiquitylation | K | 1,092 | 27,774 | 0.583 | 0.605 | 0.164 | 0.185 | 0.089 | 0.101 | 0.020 | 0.025 |
Each row shows mean performance for one PTM type, combined over all amino acids and subdata sets (motifs and non-motifs). A breakdown of performance for each amino acid and data set is provided in Supporting Information, Table S1. The “No PSSM” column represents the basic classification model and the “PSSM” column represents the model enhanced with evolutionary features. Area under the ROC curve (AUC) is shown for each PTM as well as sensitivity (sn; true positive rate) at different levels of specificity (sp; true negative rate). ROC curves for each amino acid and data set are provided in Supporting Information, Figure S3. Values marked in bold indicate the better-performing model. The data sets marked with a “*” indicate that the negatives were obtained from proteins different to those containing the positives through a random sampling procedure (See Materials and Methods).