Developing and Validating Multi-Modal Models for Mortality Prediction in COVID-19 Patients: a Multi-center Retrospective Study

. 2022 Jul 5;35(6):1514–1529. doi: 10.1007/s10278-022-00674-z

Image preprocessing
*Design decisions*	An anatomical bounding box (Bbox) extraction pipeline was used to automatically extract the coordinates for the left lung, right lung, mediastinum, and trachea anatomies from each of the frontal CXR images [27]. The extracted bounding boxes are reviewed and manually corrected as needed by clinicians (JAP, JSY, ECD). We used these anatomical Bboxes to create 4 additional versions for each image for augmentation, where in version (1) trachea Bbox was masked out with 0’s, (2) trachea Bbox was replaced with random noise, (3) background and trachea Bboxes were masked out with 0’s, and (4) background and trachea boxes were replaced with random noise. The original and augmented images are pre-saved as JPEGs without resizing at this stage. During training, when the hyperparameter for “augment_bbox” is set to true, a random version of each image (including possibly the non-augmented version) is drawn to teach the model in each epoch (see Fig. 2)
*Reasons*	As compared to simply post hoc assessing the explainability of models with Gradient-weighted Class Activation Mapping (Grad-CAM), we tried to force the CXR model to learn features from key CXR anatomies that should be relied on more heavily for prediction during the model training stage as well [27]. Non-augmented image examples are also used in the training so that the model can handle non-augmented CXR images too at inference (i.e., clinical deployment setting). Doing the Bbox augmentation offline not only makes training faster but also more deterministic. We left input size for images as a tuning parameter that is dependent on the pre-trained model teacher

Image preprocessing

Design decisions

An anatomical bounding box (Bbox) extraction pipeline was used to automatically extract the coordinates for the left lung, right lung, mediastinum, and trachea anatomies from each of the frontal CXR images [27]. The extracted bounding boxes are reviewed and manually corrected as needed by clinicians (JAP, JSY, ECD). We used these anatomical Bboxes to create 4 additional versions for each image for augmentation, where in version (1) trachea Bbox was masked out with 0’s, (2) trachea Bbox was replaced with random noise, (3) background and trachea Bboxes were masked out with 0’s, and (4) background and trachea boxes were replaced with random noise. The original and augmented images are pre-saved as JPEGs without resizing at this stage. During training, when the hyperparameter for “augment_bbox” is set to true, a random version of each image (including possibly the non-augmented version) is drawn to teach the model in each epoch (see Fig. 2)

Reasons

As compared to simply post hoc assessing the explainability of models with Gradient-weighted Class Activation Mapping (Grad-CAM), we tried to force the CXR model to learn features from key CXR anatomies that should be relied on more heavily for prediction during the model training stage as well [27]. Non-augmented image examples are also used in the training so that the model can handle non-augmented CXR images too at inference (i.e., clinical deployment setting). Doing the Bbox augmentation offline not only makes training faster but also more deterministic. We left input size for images as a tuning parameter that is dependent on the pre-trained model teacher