Skip to main content
. 2022 Jan 18;14(1):2023938. doi: 10.1080/19420862.2021.2023938

Table 2.

Comparison of online carbonylation predictors used for machine learning methods. Abbreviations: KNN, k-nearest neighbor; ROC, Receiver Operating Characteristic

Carbonylation predictor Machine learning algorithm Parameters Data Set Area under the ROC1 Reference
CarSPred Weighted support vector machine Position-specific propensity of amino acid, k-spaced
amino acid pair, KNN scores, physicochemical properties (electric properties, hydrophobicity, alpha and turn propensities, etc.)
331 lysine, 131 arginine, 128 threonine, and 129 proline carbonylation sites were extracted from 230 carbonylated human proteins.
In addition, 22 lysine, 3 arginine, 6 threonine, and 15 proline carbonylation sites were extracted from carbonylated mouse, rabbit and bovine proteins.
Lysine: 0.6704
Arginine: 0.5345
Threonine: 0.6800
Proline: 0.7873
106
iCar-PseCp Random forest pseudo amino acid composition Data was derived from 230 human carbonylated protein sequences and 20 carbonylated proteins from Photobacterium and Escherichia coli. Lysine: 0.8728
Arginine: 0.8668
Threonine: 0.8603
Proline: 0.8484
108
iCarPS Random forest 3-D conical coordinates and physicochemical properties (hydrophobicity, hydrophilicity, mass, pK1, pK2, pI, rigidity, flexibility, and irreplaceability) Same benchmark dataset as Lv, et al. (2014).109 Lysine: 0.789
Arginine: 0.726
Threonine: 0.790
Proline: 0.814
109

1Area under the curve was derived for the ROC. The ROC plots the sensitivity (i.e., true positive rate) versus selectivity.