Table 2:
Performance metrics of KaML-CBtree for acid and base pKa and protonation state predictions in comparison to the baseline modelsa
| KaML-CBtree | KaML-GAT | PROPKA3 | Null | ||||
|---|---|---|---|---|---|---|---|
| acid | base | acid + base | acid | base | acid | base | |
| PCC | 0.88 ± 0.03 | 0.92 ± 0.03 | 0.93 ± 0.02 | 0.74 ± 0.06 | 0.90 ± 0.04 | 0.55 ± 0.06 | 0.86 ± 0.04 |
| RMSE | 0.76 ± 0.13 | 0.79 ± 0.10 | 0.90 ± 0.08 | 1.28 ± 0.15 | 0.96 ± 0.18 | 1.36 ± 0.11 | 1.04 ± 0.10 |
| MAXE | 3.17 ± 0.61 | 2.60 ± 0.70 | 3.74 ± 0.49 | 3.72 ± 0.29 | 5.04 ± 0.46 | 5.55 ± 1.24 | 2.80 ± 0.67 |
|
Classification of protonation states at pH 7b | |||||||
| Pre (prot) | 0.91 | 0.99 | 0.92 | 0.66 | 0.97 | 0.63 | 0.97 |
| Rec (prot) | 0.82 | 0.97 | 0.92 | 0.78 | 0.88 | 0.39 | 0.78 |
| Pre (dep) | 0.99 | 0.95 | 0.98 | 0.98 | 0.97 | 0.95 | 0.77 |
| Rec (dep) | 0.99 | 0.99 | 0.98 | 0.97 | 0.85 | 0.98 | 0.97 |
| CERc | 34/2099 | 12/536 | 70/2822 | 90/2055 | 53/618 | 141/2106 | 101/716 |
Baseline models include PROPKA315 and null model which returns the model pKa’s:50 3.7 for Asp, 4.2 for Glu, 6.5 for His, 8.5 for Cys, 9.5 for Tyr, and 10.4 for Lys.
Prediction is based on the probability of protonation given a predicted pKa (see main text).
Critical error rate (CER) refers to the percentage of miss-classifying protonated as deprotonated or vice versa. All classification metrics were calculated after accumulating the predictions from all 20 holdout test sets.