. 2021 May 21;2(8):1225–1239. doi: 10.34067/KID.0007552020

Table 3.

Random forest model performance metrics for prediction of the composite renal and mortality end point (n=126)

Parameter	Predicted Probability Threshold >10%^a			Predicted Probability Threshold >30%^a			Predicted Probability Threshold >50%^a
Parameter	Clinical	Biomarker	Clinical and Biomarker	Clinical	Biomarker	Clinical and Biomarker	Clinical	Biomarker	Clinical and Biomarker
Sensitivity	0.92	0.96	0.98	0.79	0.85	0.88	0.69	0.71	0.69
Specificity	0.36	0.24	0.21	0.69	0.65	0.71	0.81	0.83	0.82
Positive predictive value	0.47	0.44	0.43	0.61	0.60	0.65	0.69	0.72	0.70
Negative predictive value	0.87	0.90	0.94	0.84	0.88	0.90	0.81	0.82	0.81

Three types of random forest classification models were implemented to evaluate prediction of the composite renal and mortality end point: clinical variables alone (age, sex, hypertension, diabetes, and baseline eGFR), serum biomarkers alone, and clinical variables plus serum biomarkers. A leave-one-out cross-validation approach was implemented, which consisted of excluding one individual from training the random forest model. Subsequently, the trained random forest model predicted the class of the individual excluded during model training. This process was repeated iteratively for each individual in the dataset, such that a predicted class was assigned to each study participant by each of the three random forest model types.

Model performance metrics were calculated across three probability thresholds (10%, 30%, and 50%) for labeling patients as having developed the composite renal and mortality end point. For example, an individual with a predicted probability of the composite renal and mortality end point by a random forest model of 45% would be labeled as having developed the composite renal and mortality end point by the first two thresholds evaluated, but not the latter.