Skip to main content
. 2021 Apr 27;49(W1):W285–W292. doi: 10.1093/nar/gkab295

Table 2.

Isoelectric point prediction accuracy on leave-out 25% datasets

Method Protein dataseta Method Peptide datasetb
RMSE MAE R 2 Outliersc RMSE MAE R 2 Outliersc
IPC2.protein.svr.19 0.8479 0.5906 0.5934 247 IPC2.peptide.Conv2D 0.2216 0.1216 0.9761 2691
IPC2_protein 0.8608 0.6052 0.5748 251 IPC2.peptide.svr.19 0.2299 0.1155 0.9743 2490
IPC_protein 0.8677 0.6109 0.5760 250 IPC2_peptide 0.2482 0.1394 0.9700 3179
ProMoST 0.9113 0.6444 0.5183 263 Bjellqvist 0.4051 0.2836 0.9204 11639
Toseland 0.9278 0.6537 0.5095 250 Nozaki 0.4083 0.2673 0.9191 9837
Dawson 0.9365 0.6586 0.4977 263 DTASelect 0.4235 0.2796 0.9130 10606
Bjellqvist 0.9369 0.6536 0.5005 260 Thurlkill 0.4466 0.2535 0.9033 7182
Wikipedia 0.9484 0.6795 0.4860 262 Sillero 0.4747 0.2696 0.8907 7607
Rodwell 0.9579 0.6762 0.4706 262 Dawson 0.4910 0.2642 0.8831 6698
Grimsley 0.9588 0.6953 0.4779 265 Wikipedia 0.5178 0.2974 0.8700 8326
Lehninger 0.9617 0.6783 0.4607 266 Grimsley 0.5264 0.3796 0.8656 15956
Solomon 0.9631 0.6746 0.4606 272 Rodwell 0.5855 0.3429 0.8337 9857
pIR 1.0148 0.7556 0.4161 315 Toseland 0.5860 0.3896 0.8335 13152
Nozaki 1.0164 0.7219 0.3980 288 EMBOSS 0.5971 0.3557 0.8271 11022
Thurlkill 1.0250 0.7573 0.3948 302 PredpI-iTRAQ8 0.6302 0.3503 0.8027 12059
DTASelect 1.0278 0.7798 0.3947 319 PredpI-TMT6 0.6365 0.3518 0.7988 12135
EMBOSS 1.0498 0.7757 0.3734 308 PredpI-plain 0.6480 0.3710 0.7913 12813
Sillero 1.0519 0.7694 0.3461 308 IPC_peptide 0.7459 0.4860 0.7302 13599
Patrickios 2.3764 1.8414 <0 517 Solomon 0.7518 0.4929 0.7259 13777
PredpI-TMT6 NA NA NA NA Lehninger 0.7697 0.5209 0.7127 15200
PredpI-plain NA NA NA NA pIR 0.8529 0.7303 0.6387 27158
PredpI-iTRAQ8 NA NA NA NA ProMoST 1.1026 0.7562 0.4104 18513
Patrickios 2.0172 1.3927 <0 22818

aProtein dataset consisting of 581 proteins (25% randomly chosen proteins, not used for the training or optimization).

bPeptide dataset consisting of 29 774 peptides (25% randomly chosen peptides, not used for the training or optimization).

cThe outliers were defined at 0.5 and 0.25 pH unit difference between the predicted and experimental pI thresholds for the protein and peptide datasets.

NA: The PredpI program was designed for peptides only within the 3.7–4.9 pH range; thus, for proteins, it returned 0 and could not be evaluated on the protein dataset.

New machine learning models developed in this study are in bold. First version of IPC (12) is underscored. Scores calculated after 10-fold cross-validation. Table is sorted by RMSD. For individual methods’ predictions, see Supplementary Data 2. For more details about the datasets, see Table 1.