(A) Test set segmentation AUPRC scores for SIIM-ACR
Pneumothorax Segmentation dataset and (B) test set bounding
box detection AUPRC scores for RSNA Pneumonia Detection Challenge
dataset. Each box plot represents the distribution of scores across the
test datasets for each saliency map, with a solid line denoting the
median and a dashed line denoting the mean. Results are compared with a
low baseline using the average segmentation or bounding box of the
training and validation sets (light blue) and high baseline using U-Net
or RetinaNet (dark blue). (C) Example saliency maps on
SIIM-ACR pneumothorax dataset with corresponding AUPRC scores and
(D) on RSNA pneumonia dataset with corresponding
utility scores. “AVG” refers to using the average of all
ground-truth masks (for pneumothorax) or bounding boxes (for pneumonia)
across the training and validation datasets; “UNET” refers
to using the U-Net trained on a segmentation task for localization of
pneumothorax; “RNET” refers to using RetinaNet to generate
bounding boxes for localizing pneumonia with bounding boxes. ACR =
American College of Radiology, AUPRC = area under the precision-recall
curve, GBP = guided backpropagation, GCAM = gradient-weighted class
activation mapping, GGCAM = guided GCAM, GRAD = gradient explanation, IG
= integrated gradients, RSNA = Radiological Society of North America, SG
= Smoothgrad, SIG = smooth IG, SIIM = Society for Imaging Informatics in
Medicine.