Skip to main content
. 2025 Aug 21;13:e72938. doi: 10.2196/72938

Figure 4. Heatmaps illustrating how gradually introducing random label noise (NAR) degrades model performance. In each panel, the x-axis denotes the percentage of true negatives flipped to false positives, and the y-axis denotes the percentage of true positives flipped to false negatives. In the top-left heatmap, the ROC AUC on the training set is plotted; lighter cells signify stronger discrimination, and values above 0.5 represent performance better than random chance. The top-right heatmap presents the corresponding average precision on the same noisy training data, with lighter colors indicating a more favorable precision-recall trade-off. The bottom-left and bottom-right heatmaps repeat these experiments on the held-out test set, showing ROC AUC and average precision, respectively, under increasing label noise in the test data. The unusual behavior of average precision is discussed in this paper. Each cell is annotated with the exact metric value for that combination of false positive and false negative noise levels. AUC: area under the curve; NAR: noise at random; ROC AUC: area under the receiver operating characteristic curve.

Figure 4.