. Author manuscript; available in PMC: 2023 Nov 4.

Published in final edited form as: J Vasc Interv Radiol. 2020 May 4;31(6):1018–1024.e4. doi: 10.1016/j.jvir.2019.11.030

Table 3.

Model Performance Metrics

	TTB	TIPS	UAE
AUROC	0.913	0.788	0.879
Maximum F1 score	0.532	0.376	0.700
Precision (at maximum F1 score)	0.426	0.279	0.563
Recall (at maximum F1 score)	0.709	0.576	0.915
Sensitivity	90.0%	90.0%	90.0%
Specificity (at sensitivity of 90%)	82.4%	45.3%	68.0%
Threshold (at sensitivity of 90%)	0.209	0.103	0.195

Note–Performance metrics for each of the random forest models when evaluated on the testing set. AUROC is a good overall summary of each model’s performance. The maximum F1 score is useful for evaluating the performance of each model on imbalanced data (ie, when there are far more patients without the outcome of interest than with the outcome of interest). The F1 score is defined as the harmonic mean of precision (positive predictive value) and recall (sensitivity). Precision and recall values corresponding to the maximum F1 score have also been provided. Threshold refers to the classifier value that fixed sensitivity at 90%. The corresponding specificity was computed and is reported.

AUROC = area under the receiver operating characteristic curve; TIPS = transjugular intrahepatic portosystemic shunt; TTB = transthoracic biopsy; UAE = uterine artery embolization.