Fig. 4.
Comparisons of the pCR prediction performance metrics of sTILs, subtype, and pCR-score in the 16-time repeated validation (In each repeat, we randomly select 60% of the data as training data, and the remaining 40% as testing data. Mean values of each metric were calculated from the 16 repeats to avoid the impact of data bias). A Comparisons of the F1 score and accuracy of models. B Comparisons of the AUCs of models. C Comparisons of the sensitivity (equal to recall score), PPV (equal to precision score), specificity, and NPV of models. D Comparisons of TP, FN, FP, and TN in confusion matrices among models