Comparison of, A, receiver operating characteristic
(ROC) curves for DenseNet-121 (NN) and
NN+PL (mean of NN score and prospective label [PL]
score) classifiers and, B, area under the ROC curve
(AUC) histograms obtained from a 1000-sample test set
by using the bootstrap method. Each ROC curve represents the output of one
representative NN model. In B, solid lines indicate mean
values, and dashed lines indicate standard deviation from the mean. Data set
size (K = 1000 points) refers to total size (training
+ development, 90-to-10 split).