Table 2.
ML model performance exceeds no-skill prediction in 20% holdout subsets on the basis of four evaluation metrics
Cause | Model | ML Model Performance Metric | |||
---|---|---|---|---|---|
ROC-AUC | PR-AUC | F1 Score | MCC | ||
None | No-skill | 0.5 | Prevalence | 0 | 0 |
FSGS | SVM | 0.94 (0.93, 0.95) | 0.60 (0.57, 0.63) | 0.59 (0.57, 0.61) | 0.59 (0.57, 0.61) |
RF | 0.89 (0.88, 0.90) | 0.50 (0.48, 0.51) | 0.47 (0.46, 0.48) | 0.45 (0.44, 0.46) | |
XGB | 0.91 (0.90, 0.91) | 0.54 (0.53, 0.56) | 0.48 (0.47, 0.49) | 0.47 (0.46, 0.48) | |
OU | SVM | 0.84 (0.84, 0.85) | 0.52 (0.51, 0.53) | 0.54 (0.53, 0.54) | 0.44 (0.44, 0.45) |
RF | 0.73 (0.73, 0.74) | 0.39 (0.38, 0.40) | 0.42 (0.41, 0.42) | 0.28 (0.27, 0.29) | |
XGB | 0.79 (0.79, 0.80) | 0.45 (0.43, 0.46) | 0.48 (0.47, 0.48) | 0.37 (0.37, 0.38) | |
A/D/H | SVM | 0.84 (0.83, 0.85) | 0.51 (0.50, 0.52) | 0.53 (0.51, 0.54) | 0.44 (0.42, 0.45) |
RF | 0.68 (0.68, 0.69) | 0.30 (0.29, 0.31) | 0.38 (0.38, 0.39) | 0.24 (0.23, 0.25) | |
XGB | 0.75 (0.75, 0.76) | 0.38 (0.37, 0.39) | 0.43 (0.42, 0.44) | 0.32 (0.31, 0.33) | |
RN | SVM | 0.80 (0.79, 0.81) | 0.37 (0.36, 0.38) | 0.41 (0.40, 0.42) | 0.34 (0.33, 0.35) |
RF | 0.66 (0.65, 0.66) | 0.19 (0.19, 0.20) | 0.31 (0.30, 0.31) | 0.20 (0.19, 0.21) | |
XGB | 0.73 (0.72, 0.73) | 0.25 (0.25, 0.26) | 0.33 (0.33, 0.34) | 0.25 (0.25, 0.26) |
All 12 iterations of our ML models (three algorithms for four cause subgroups) exceeded no-skill prediction on the basis of four different evaluation metrics in 20% holdout subsets. ROC-AUC is the most traditional and familiar of the four metrics, but may overestimate model performance in samples with low case prevalence rate, as in CKiD. PR-AUC accounts for the skewed case distribution. However, PR-AUC magnitude does not give additional insight into model performance beyond if it surpassed no-skill prediction or not. The F1 score is a harmonic mean of the precision and recall, performing similarly to the PR-AUC. The F1 score magnitude does reflect model performance, with 0 being equivalent to no-skill and 1 being perfect prediction. MCC performs similarly to the F1, but additionally includes true negatives in its calculation, which gives directionality to this metric; perfect negative prediction=−1, perfect positive prediction=0.