Automated Assessment and Tracking of COVID-19 Pulmonary Disease Severity on Chest Radiographs using Convolutional Siamese Neural Networks

Matthew D Li; Nishanth Thumbavanam Arun; Mishka Gidwani; Ken Chang; Francis Deng; Brent P Little; Dexter P Mendoza; Min Lang; Susanna I Lee; Aileen O’Shea; Anushri Parakh; Praveer Singh; Jayashree Kalpathy-Cramer

doi:10.1148/ryai.2020200079

. 2020 Jul 22;2(4):e200079. doi: 10.1148/ryai.2020200079

Automated Assessment and Tracking of COVID-19 Pulmonary Disease Severity on Chest Radiographs using Convolutional Siamese Neural Networks

Matthew D Li ¹, Nishanth Thumbavanam Arun ¹, Mishka Gidwani ¹, Ken Chang ¹, Francis Deng ¹, Brent P Little ¹, Dexter P Mendoza ¹, Min Lang ¹, Susanna I Lee ¹, Aileen O’Shea ¹, Anushri Parakh ¹, Praveer Singh ¹, Jayashree Kalpathy-Cramer ^1,^✉

PMCID: PMC7392327 PMID: 33928256

Abstract

Purpose

To develop an automated measure of COVID-19 pulmonary disease severity on chest radiographs (CXRs), for longitudinal disease tracking and outcome prediction.

Materials and Methods

A convolutional Siamese neural network-based algorithm was trained to output a measure of pulmonary disease severity on CXRs (pulmonary x-ray severity (PXS) score), using weakly-supervised pretraining on ∼160,000 anterior-posterior images from CheXpert and transfer learning on 314 frontal CXRs from COVID-19 patients. The algorithm was evaluated on internal and external test sets from different hospitals (154 and 113 CXRs respectively). PXS scores were correlated with radiographic severity scores independently assigned by two thoracic radiologists and one in-training radiologist (Pearson r). For 92 internal test set patients with follow-up CXRs, PXS score change was compared to radiologist assessments of change (Spearman ρ). The association between PXS score and subsequent intubation or death was assessed. Bootstrap 95% confidence intervals (CI) were calculated.

Results

PXS scores correlated with radiographic pulmonary disease severity scores assigned to CXRs in the internal and external test sets (r=0.86 (95%CI 0.80-0.90) and r=0.86 (95%CI 0.79-0.90) respectively). The direction of change in PXS score in follow-up CXRs agreed with radiologist assessment (ρ=0.74 (95%CI 0.63-0.81)). In patients not intubated on the admission CXR, the PXS score predicted subsequent intubation or death within three days of hospital admission (area under the receiver operating characteristic curve=0.80 (95%CI 0.75-0.85)).

Conclusion

A Siamese neural network-based severity score automatically measures radiographic COVID-19 pulmonary disease severity, which can be used to track disease change and predict subsequent intubation or death.

Summary

A convolutional Siamese neural network-based algorithm can calculate a continuous radiographic pulmonary disease severity score in COVID-19 patients, which can be used for longitudinal disease evaluation and clinical risk stratification.

Key Points

■ A Siamese neural network-based severity score correlates with radiologist-annotated pulmonary disease severity on chest radiographs from patients with COVID-19 (r=0.86 (95% CI 0.80-0.90) and r=0.86 (95% CI 0.79-0.90) in internal and external test sets respectively).
■ The direction of change in the severity score in follow-up radiographs is concordant with radiologist assessment (ρ=0.74 (95% CI 0.63-0.81)).
■ The admission chest radiograph severity score can help predict subsequent intubation or death within three days of admission (receiver operating characteristic area under the curve=0.80 (95% CI 0.75-0.85)).

Introduction

The role of diagnostic chest imaging continues to evolve during the COVID-19 pandemic. According to American College of Radiology guidelines, while chest CT is not recommended for COVID-19 diagnosis or screening, portable chest radiographs (CXRs) are suggested when medically necessary (1). The Fleischner Society has stated that CXRs can be useful for assessing COVID-19 disease progression (2) and one study found that 69% of these patients have an abnormal baseline CXR (3).

While radiographic findings are neither sensitive nor specific for COVID-19, with findings overlapping other infections and pulmonary edema, CXRs can be useful for assessing pulmonary infection severity and evaluating longitudinal changes. However, there is substantial variability in the interpretations of CXRs by radiologists, as has been demonstrated for pneumonia (4–6). In addition, commonly used disease severity categories on chest radiographs, such as “mild,” “moderate,” and “severe,” are challenging to reproduce as the thresholds between these categories are subjective.

One possible solution to these challenges is to train a convolutional Siamese neural network to estimate radiographic disease severity on a continuous spectrum (7). Siamese neural networks take two separate images as inputs, which are passed through twinned neural networks (8,9). The Euclidean distance between the final two layers of the networks can be calculated, which serves as a measure of distance between the two images with respect to the imaging features being trained on, such as disease features. If an image-of-interest is compared pairwise to a pool of “normal” images, the disease severity can be abstracted to the median of those Euclidean distances.

In this study, we hypothesized that a convolutional Siamese neural network-based algorithm could be trained to yield a measure of radiographic pulmonary disease severity on frontal CXRs (pulmonary x-ray severity (PXS) score). We evaluated the algorithm performance on internal and external test sets of CXRs from patients with COVID-19. We also investigated the association between the admission PXS score and subsequent intubation or death.

Materials and Methods

This Health Insurance Portability and Accountability Act-compliant retrospective study was reviewed and exempted by the Institutional Review Board of Massachusetts General Hospital (Boston, MA), with waiver of informed consent.

Chest Radiograph Data

To train our model, we used a publicly available CXR data set, CheXpert, from Stanford Hospital, Palo Alto (10), for pretraining and a CXR data set from COVID-19 positive patients for subsequent training (Figure 1A). Additional COVID-19 CXR datasets were assembled for model testing and analysis of longitudinal change.

A, Schematic for training the convolutional Siamese neural network-based algorithm used to calculate the Pulmonary X-Ray Severity (PXS) score, a continuous measure of radiographic pulmonary disease severity in COVID-19 patients. The network is pre-trained with chest radiographs (CXRs) from CheXpert (10) using binary lung disease presence labels and then trained on CXRs from a COVID-19 training set using annotations for modified Radiographic Assessment of Lung Edema (mRALE) scores. B, Schematic for calculating the PXS score, which is calculated by comparing the image-of-interest pairwise with a pool of normal CXRs from CheXpert. Dw = Euclidean distance; MSE loss = mean square error.

CheXpert contains 224,316 CXRs, with annotations for image view, which we used to filter for AP radiographs only, as suspected or confirmed COVID-19 positive patients tend to be imaged more frequently in the AP projection in emergency rooms and hospitals. CheXpert also includes a partition for training and validation, and after filtering for only AP images, the training and validation sets used for pre-training contained 161,590 and 169 images, respectively. For each image in this dataset, there are multiple radiology report-derived annotations that represent pulmonary parenchymal findings, including “lung opacity,” “lung lesion,” “consolidation,” “pneumonia,” “atelectasis,” and “edema.” For the purpose of creating a binary label for model pre-training, we considered any image with at least one of these annotations (labeled positive or uncertain) to have an abnormal lung label. All other images were considered to have normal lungs (irrespective of lines and tubes, cardiomegaly, and other findings). 81% of training images had abnormal lung labels (Supplemental Table 1).

To assemble COVID-19 CXR datasets, we obtained raw DICOM data for CXRs at a large urban quaternary-care hospital in the United States (Massachusetts General Hospital [Boston, MA]), from COVID-19 positive patients (confirmed by nasopharyngeal swab RT-PCR). The COVID-19 training set contained 314 admission CXRs from consecutive unique patients hospitalized at least in part April 1-10, 2020, randomly partitioned 9:1 for training and validation (282:32 images). The COVID-19 internal test set contained 154 admission CXRs from consecutive unique patients hospitalized at least in part March 27-31, 2020. One hospitalized patient with COVID-19 from this time period was excluded from the test set due to prior pneumonectomy. There was no overlap between training and test set patients. Among the COVID-19 internal test set patients, 92 underwent a follow-up CXR within 12 days of admission. The DICOM data for these follow-up radiographs were also obtained for longitudinal analysis For DICOMs containing more than one frontal image acquisition, the standard frontal CXR image without postprocessing was selected, with the best positioning available (selected by M.D.L., postgraduate year 4 in-training radiologist). Most of these studies were in AP projection, as extracted from the DICOM metadata (Supplemental Table 2). Intubation and mortality data were collected from the medical record by two investigators blinded to CXR findings (A.O. and A.P., radiologists in fellowship training). We also obtained raw DICOM data for 113 consecutive admission CXRs associated with unique patients hospitalized at least in part on April 15, 2020 at a community hospital in the United States (Newton-Wellesley Hospital [Newton, MA]), from COVID-19 positive patients (confirmed by nasopharyngeal swab RT-PCR), which served as an external test set.

Radiologist Scoring of Pulmonary Disease Severity on Chest Radiographs

To provide a reference standard assessment of disease severity on CXRs, we used a simplified version of the Radiographic Assessment of Lung Edema (RALE) score (11). This grading scale was originally validated for use in pulmonary edema assessment in acute respiratory distress syndrome (ARDS) and incorporates the extent and density of alveolar opacities on CXRs. The grading system is relevant to COVID-19 patients as the CXR findings tend to involve multifocal alveolar opacities (3) and many hospitalized COVID-19 patients develop ARDS (12). In our study, we use a modified RALE (mRALE) score. Each lung is assigned a score for the extent of involvement by consolidation or ground glass/hazy opacities (0=none; 1=<25%; 2=25-50%; 3=50-75%; 4=>75% involvement). Each lung score is then multiplied by an overall density score (1=hazy, 2=moderate, 3=dense). The sum of scores from each lung is the mRALE score (examples in Supplemental Figure 1). Thus, a normal CXR receives a score of 0, while a CXR with complete consolidation of both lungs receives the maximum score of 24. mRALE differs from the original RALE score in that the lungs are not divided into quadrants.

Using the mRALE scoring system, two in-training radiologists (M.D.L. and F.D., both postgraduate year 4) independently annotated each image in the COVID-19 training set. Two fellowship-trained thoracic radiologists (B.P.L., 11 years of experience; D.P.M., 2 years of experience) and an in-training radiologist (M.D.L. for the internal test set and F.D. for the external test set) independently annotated each image in the COVID-19 internal and external test sets. The reference standard mRALE score for each image is the average of the raters. Annotator instructions and viewing conditions are in the Supplemental Materials. Inter-rater correlations between each of the raters were evaluated.

Radiologist Assessment of Longitudinal Change

The same raters who assessed the COVID-19 internal test set also evaluated the 92 internal test set patients with follow-up CXRs. For each longitudinal image pair, the raters independently assigned the label: decreased, same, or increased pulmonary disease severity (see Supplemental Materials for annotator viewing conditions). The majority change label was assigned with two or more votes for one label.

Convolutional Siamese Neural Network Training

A convolutional Siamese neural network architecture takes two separate images as inputs, which are separately passed through identical subnetworks with shared weights (schematic in Figure 1A, see Supplemental Materials for image pre-processing details) (8,9). We built such a network using DenseNet121 (13) as the underlying subnetwork with initial pre-training on ImageNet, as this architecture had empirically performed well for classification tasks in the CheXpert study (10). The Euclidean distance D_w between the subnetwork outputs, G_w(X₁) and G_w(X₂), given image input vectors X₁ and X₂, is calculated from the equation Inline graphic (9).

We used a two-step training strategy, that involves pre-training with weak labels on the large CheXpert data set using the contrastive loss function (8), followed by transfer learning to the relatively small COVID-19 training set using mean square error loss, using the assigned mRALE scores as disease severity labels. The contrastive loss function teaches the model the difference between abnormal and normal lungs, while the mean square error loss teaches the model a representation of difference in mRALE scores. Details regarding the training strategy are in the Supplemental Materials. The code is available at https://github.com/QTIM-Lab/PXS-score. For comparison, models were also trained using only the first or second training steps.

Calculating the Pulmonary X-Ray Severity (PXS) Score

After training the Siamese neural network, when two CXR images are passed through the subnetworks, the Euclidean distance calculated from the subnetwork outputs can serve as a continuous measure of difference between the two CXRs, with respect to pulmonary parenchymal findings. Thus, to evaluate a single image-of-interest for pulmonary disease severity, an image can be compared to a pool of N images without a lung abnormality (schematic in Figure 1B). We created a pool of normal images using all cases labeled with “No Finding” from the CheXpert validation set (N=12, ages 19-68 years, 7 women; Supplemental Materials). Using the Siamese neural network, the Euclidean distance is calculated between the image-of-interest and each of the N normal images, and the median Euclidean distance is calculated. This median Euclidean distance is the Pulmonary X-Ray Severity (PXS) score.

Occlusion sensitivity maps for visualizing Siamese neural network outputs

We used an occlusion sensitivity approach (14) to visualize what portions of the input images were important to the Siamese neural network for calculating the PXS score. See the Supplemental Materials for details.

Statistical Analysis

We used Chi-square and Mann-Whitney tests, Pearson correlation (r), Spearman rank correlation (ρ), linear Cohen’s kappa (κ), Fisher’s exact test for odds ratios, and bootstrap 95% confidence intervals where appropriate (details in Supplementary Materials). The threshold for statistical significance was considered a priori to be P<0.05.

Results

COVID-19 Data Set Characteristics

There was no significant difference in age, sex, or mRALE scores between the training set and internal test set; patients in the external test set were significantly older than in the training and internal test sets, but there was no significant difference in sex or mRALE scores (Table 1). For the 468 patients from the combined training and internal test sets, 134 patients were intubated or died within 3 days of hospital admission. The age and mRALE scores were significantly higher in these patients (Table 2).

Table 1.

Summary of dataset characteristics and radiologist mRALE scores. N, Number; Q1-Q3, Quartile 1 to Quartile 3 (i.e. interquartile range).

Open in a new tab

Table 2.

Patient and CXR characteristics stratified by outcome for the combined training and internal test set data (N = 468). N, Number; Q1-Q3, Quartile 1 to Quartile 3 (i.e. interquartile range).

Open in a new tab

mRALE Score Inter-Rater Correlation

The correlation between the mRALE scores assigned by the radiologist raters was similar in the COVID-19 datasets (r=0.84-0.88, P<0.001 in all cases; see Supplemental Materials for details).

Siamese Neural Network-based PXS Score Correlates with mRALE score

In the internal test set, the Siamese neural network-based PXS score correlated with the average mRALE score assigned, which is a measure of radiographic pulmonary disease severity (r=0.86 (95% CI 0.80-0.90), P<0.001) (Figure 2A). In the external test set, the PXS score also correlated with the average mRALE score assigned (r=0.86 (95% CI 0.79-0.90), P<0.001) (Figure 2B). Using an occlusion sensitivity map-based approach, we show that the network focuses its attention on pulmonary opacities (Figure 2C). Pre-training improved model performance (Table 3; Supplemental Materials).

Siamese neural network-based Pulmonary X-Ray Severity (PXS) score is a measure of radiographic pulmonary disease severity in patients with COVID-19. A and B, Scatterplots show, in a 154-patient internal test set (A) and 113-patient external hospital test set (B), the PXS score correlates with the modified Radiographic Assessment of Lung Edema (mRALE) score, a measure of pulmonary disease severity on chest radiographs (p=0.86, P<0.001 and p=0.86, P<0.001, respectively) (linear regression 95% confidence interval shown in the scatterplots). C, Occlusion sensitivity map-based approach shows that the Siamese neural network is focusing on pulmonary opacities. Yellow areas indicate parts of the image important to the neural network.

Table 3.

Siamese neural network model performance was improved by pre-training on CheXpert using weak labels (abnormal versus normal lung) followed by training using COVID-19 CXRs annotated for lung disease severity (mRALE score).

Open in a new tab

Longitudinal Change Assessment with the PXS Score

Of the internal test set patients with available longitudinal CXRs, according to the assigned majority vote change labels, 24 (26%), 19 (21%), and 44 (48%) of patients showed a decrease, no change, or increase in pulmonary disease severity respectively. Five patients (5%) did not receive majority votes (i.e. the three raters each voted differently; examples in Supplemental Figure 2) and were omitted from further analysis, which reflects subjectivity in the interpretation of heterogeneous CXRs. The inter-rater reliability between the three raters for assigning change labels was moderate (linear Cohen’s κ=0.58, 0.59, 0.57).

The change in PXS score between two longitudinally acquired images correlates with the majority vote change label (ρ=0.74 (95% CI 0.63-0.81), P<0.001) (Figure 3A). For patients labeled with decreased disease severity, 18 (75%) were associated with decreased PXS score. For patients labeled for increased disease severity, 43 (98%) were associated with increased PXS score. For patients labeled for no change, the mean PXS score change is 0.1 (standard deviation ± 1.3). Illustrative examples of longitudinal change assessment are shown in Figure 3B. In cases labeled for no change but with an PXS score absolute change >1, variations in inspiratory effort and positioning seem to account for the PXS change (examples shown in Supplemental Figure 3).

Siamese neural network-based Pulmonary X-Ray Severity (PXS) score can be used to assess longitudinal change in radiographic disease severity over time in COVID-19 patients. A, Boxplot shows the PXS score correlates with majority vote change in pulmonary disease severity (ρ=0.74, P<0.001), where -1, 0, and 1 indicate decreased, unchanged, and increased severity in longitudinal chest radiograph pairs, assigned by three independent raters (2 thoracic radiologists, 1 in-training radiologist). The boxplot boxes indicate the median and interquartile range (IQR), with whiskers extending to points within 1.5 IQRs of the IQR boundaries. B, Examples of PXS score evaluation of longitudinal change in three patients with COVID-19.

Association Between PXS Score and Intubation or Death

The PXS score was significantly higher on admission CXRs of patients with COVID-19 who were intubated or dead within 3 days of admission from our training and internal test sets, compared to those who were not intubated (median PXS score 7.9 versus 3.2, P<0.001) (Figure 4A). Importantly, the PXS score algorithm is not trained on outcomes data. Of the 134 patients who were intubated or died within 3 days of admission, 76 were intubated or died on the admission day and 31, 12 and 15 patients on hospital days 1, 2, and 3 respectively. A higher PXS score is associated with a shorter time interval before intubation or death in these patients (ρ=0.25, P=0.004) (Figure 4B).

Siamese neural network-based Pulmonary X-Ray Severity (PXS) score is associated with intubation in patients hospitalized with COVID-19. A, Boxplot shows the PXS score is significantly higher in patients intubated within three days of hospital admission (P<0.001). B, Boxplot shows that a higher PXS score is associated with a shorter time interval before intubation (ρ=0.25, P=0.004), C, Receiver operating characteristic and precision recall curves show the performance of the PXS score for predicting subsequent intubation within three days of hospital admission, in patients without an endotracheal tube on their admission chest radiograph (AUC, area under the curve; dashed lines indicate bootstrap 95% confidence intervals).

Given these findings, we used the PXS score as a continuous input for prediction of intubation or death within 3 days of hospital admission. For the 437 patients without an endotracheal tube present on the admission CXR, the receiver operating characteristic area under the curve (AUC) was 0.80 (bootstrap 95% CI 0.75-0.85) (Figure 4C). The PXS threshold can be set at different levels to obtain different test characteristics, which also be expressed as odds ratios (Table 4).

Table 4.

Admission radiograph PXS scores in hospitalized patients with COVID-19 (without endotracheal tube on admission CXR, N = 437) are associated with increased odds ratios for subsequent intubation or death within 3 days of admission. N, number.

Open in a new tab

Discussion

Front-line clinicians estimate the risk for clinical decompensation in patients with COVID-19 using a combination of data, including epidemiologic factors, comorbidities, vital signs, lab values, and clinical intuition (12,15). The chest radiograph can help contribute to this assessment, but manual assessment of severity is subjective and requires expertise. In this study, we designed and trained a Siamese neural network-based algorithm to provide an automated measure of COVID-19 disease severity on chest radiographs in hospitalized patients, the Pulmonary X-ray Severity (PXS) score. The PXS score correlates with a manually annotated measure of radiographic disease severity in internal and external test sets, and the direction of change in PXS score for longitudinally acquired radiographs is concordant with radiologist assessment. For patients with COVID-19 presenting to the hospital with an admission chest radiograph, the PXS score can help predict subsequent intubation or death.

The automatic PXS score can potentially be rapidly scaled and deployed, which has important clinical applications in the COVID-19 pandemic, particularly in countries like the United States or under-resourced settings where CXRs are frequently acquired, while CT studies are relatively rarely obtained. For example, in the emergency room, clinicians must decide whether or not a patient is safe to discharge home. By setting the PXS score threshold in favor of sensitivity for prediction of intubation or death, the score can be used to help with such decisions. Additionally, PXS score can potentially be used to improve existing and newCOVID-19 machine learning models that account for other variables like vital signs, lab values, and co-morbidities (16). Other potential applications include radiologist workflow optimization, where CXRs with more severe findings can be interpreted earlier, and hospital resource management, where the PXS score can help with resource allocation (e.g. prediction of future ventilator need).

Various grading systems have been developed to measure respiratory disease severity on chest imaging, including for pulmonary edema in ARDS (11), severe acute respiratory infection (17), parainfluenza virus-associated infections (18), and pediatric pneumonia (19). A manual radiographic grading system for COVID-19 lung disease severity has been associated increased odds of intubation (20). These studies use manually annotated features from chest imaging to predict outcomes, such as mortality, need for intensive care, and other adverse events. However, barriers to adoption of these systems include inter-rater reliability and learning curve for users. In our study, raters assessing longitudinal change showed only moderate inter-rater agreement. Our automated Siamese neural network-based approach addresses these challenges.

Deep learning-based algorithms have been applied to CXRs extensively, but primarily for disease detection, such as for pneumonia and tuberculosis (21,22), as well as for COVID-19 localization on CXR images (23). However, due to the nature of chest radiography, there are limits to the sensitivity and specificity of this modality for COVID-19 detection (3). There is a relative paucity of research using deep learning for disease severity assessment on CXRs. Automated evaluation of pulmonary edema severity on CXRs has been explored using a deep learning model that incorporates ordinal regression of edema severity labels in training (no, mild, moderate, or severe edema) (24). These severity labels were extracted from associated radiology reports, but are inherently noisy given the variability in interpretation of the CXRs (25,26). This problem of noisy labels extends beyond pulmonary edema to any disease process where there is subjectivity in interpretation. Our Siamese neural network-based approach mitigates the label noise via transfer learning on data labeled with mRALE, a more fine-grained scoring system which showed high agreement between raters in our study. In addition, pre-training of the Siamese neural network on public data with weak labels helped boost performance.

There are limitations to this study. First, patients in this study were from urban areas of the United States, which may limit the external generalizability of this algorithm to other locations. However, given that the model was able to generalize to a second hospital (community hospital vs quaternary care center) with similar performance, the model seems robust. The generalizability between two hospitals also suggests the model is reasonably robust to image acquisition technique, including differences in x-ray machinery, beam penetration, and technologist technique. Second, abnormal patient positioning and respiratory phase may introduce variability, that may impact the algorithm performance. However, since the algorithm explicitly learns to assess radiographic disease severity, quality control is relatively simple as the PXS score can be compared visually to what is expected on sample studies. Third, our algorithm was trained using predominantly AP chest radiographs, as AP positioning is more common than posterior-anterior images among patients with COVID-19. This may limit the generalizability of the algorithm model for posterior-anterior (PA) radiographs, though future testing on PA test sets is required. Fourth, the longitudinal images were presented to raters as side-by-side JPEG image pairs for convenience, which could be less accurate than if the studies were viewed in PACS.

We developed an automated Siamese neural network-based pulmonary disease severity score for patients with COVID-19, with the potential to help with clinical triage and workflow optimization. With further validation, the score could be incorporated into clinical treatment guidelines to be used together with other clinical and lab data. The score could be validated for association/prediction with other outcomes, like oxygen saturation. Beyond the COVID-19 pandemic, this automated severity score could also be modified and applied to other continuous disease processes manifesting on chest radiographs, like pulmonary edema, interstitial lung disease, and other infections.

Acknowledgments

The authors thank Jeremy Irvin for sharing the CheXpert pre-processing script.

Research reported in this publication was supported by a training grant from the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under award number 5T32EB1680 and by the National Cancer Institute (NCI) of the National Institutes of Health under Award Number F30CA239407 to K. Chang. This study was supported by National Institutes of Health grants U01 CA154601, U24 CA180927, and U24 CA180918 to J. Kalpathy-Cramer. This research was carried out in whole or in part at the Athinoula A. Martinos Center for Biomedical Imaging at the Massachusetts General Hospital, using resources provided by the Center for Functional Neuroimaging Technologies, P41EB015896, a P41 Biotechnology Resource Grant supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB), National Institutes of Health. GPU computing resources were provided by the MGH and BWH Center for Clinical Data Science.

Abbreviations:

COVID-19: coronavirus disease 2019
CXR: chest radiograph
RT-PCR: reverse transcriptase-polymerase chain reaction
AP: anterior-posterior
mRALE score: modified Radiographic Assessment of Lung Edema score
PXS score: pulmonary x-ray severity score
AUC: area under the curve
CI: confidence interval

References

1.ACR Recommendations for the use of Chest Radiography and Computed Tomography (CT) for Suspected COVID-19 Infection | American College of Radiology. https://www.acr.org/Advocacy-and-Economics/ACR-Position-Statements/Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection. Accessed March 27, 2020.
2.Rubin GD, Haramati LB, Kanne JP, et al. The Role of Chest Imaging in Patient Management during the COVID-19 Pandemic: A Multinational Consensus Statement from the Fleischner Society. Radiology. 2020;201365. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Wong HYF, Lam HYS, Fong AH-T, et al. Frequency and Distribution of Chest Radiographic Findings in COVID-19 Positive Patients. Radiology. 2019;201160. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Albaum MN, Hill LC, Murphy M, et al. Interobserver reliability of the chest radiograph in community-acquired pneumonia. Chest. 1996;110(2):343–350. [DOI] [PubMed] [Google Scholar]
5.Loeb MB, Carusone SBC, Marrie TJ, et al. Interobserver Reliability of Radiologists’ Interpretations of Mobile Chest Radiographs for Nursing Home-Acquired Pneumonia. J Am Med Dir Assoc. 2006;7(7):416–419. [DOI] [PubMed] [Google Scholar]
6.Neuman MI, Lee EY, Bixby S, et al. Variability in the interpretation of chest radiographs for the diagnosis of pneumonia in children. J Hosp Med. 2012;7(4):294–298. [DOI] [PubMed] [Google Scholar]
7.Li MD, Chang K, Bearce B, et al. Siamese neural networks for continuous disease severity evaluation and change detection in medical imaging. NPJ Digit Med. 2020;3(1):48. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Bromley J, Bentz Jw, Bottou L, et al. Signature Verification Using a “Siamese” Time Delay Neural Network. Int J Pattern Recognit Artif Intell. 1993;7(4):669–688. [Google Scholar]
9.Hadsell R, Chopra S, LeCun Y. Dimensionality Reduction by Learning an Invariant Mapping. 2006 IEEE Comput Soc Conf Comput Vis Pattern Recognit - Vol 2. IEEE; 1735–1742. http://ieeexplore.ieee.org/document/1640964/. Accessed June 9, 2019.
10.Irvin J, Rajpurkar P, Ko M, et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. 2019. http://arxiv.org/abs/1901.07031. Accessed January 4, 2020.
11.Warren MA, Zhao Z, Koyama T, et al. Severity scoring of lung oedema on the chest radiograph is associated with clinical outcomes in ARDS. Thorax. 2018;73(9):840–846. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Zhou F, Yu T, Du R, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395(10229):1054–1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc.; 2016;2017-January:2261–2269. http://arxiv.org/abs/1608.06993. Accessed March 29, 2020.
14.Zeiler MD, Fergus R. Visualizing and Understanding Convolutional Networks. 2013. http://arxiv.org/abs/1311.2901. Accessed December 7, 2019.
15.Phua J, Weng L, Ling L, et al. Intensive care management of coronavirus disease 2019 (COVID-19): challenges and recommendations. Lancet Respir Med. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Wynants L, Van Calster B, Bonten MMJ, et al. Prediction models for diagnosis and prognosis of covid-19 infection: Systematic review and critical appraisal. BMJ. 2020;369. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Taylor E, Haven K, Reed P, et al. A chest radiograph scoring system in patients with severe acute respiratory infection: A validation study. BMC Med Imaging. 2015;15(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Sheshadri A, Shah DP, Godoy M, et al. Progression of the Radiologic Severity Index predicts mortality in patients with parainfluenza virus-associated lower respiratory infections. PLoS One. 2018;13(5):e0197418. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.McClain L, Hall M, Shah SS, et al. Admission chest radiographs predict illness severity for children hospitalized with pneumonia. J Hosp Med. 2014;9(9):559–564. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Toussie D, Voutsinas N, Finkelstein M, et al. Clinical and Chest Radiography Features Determine Patient Outcomes In Young and Middle Age Adults with COVID-19. Radiology. 2020;201754. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. 2017. http://arxiv.org/abs/1711.05225. Accessed January 4, 2020.
22.Lakhani P, Sundaram B. Deep learning at chest radiography: Automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. 2017;284(2):574–582. [DOI] [PubMed] [Google Scholar]
23.Hurt B, Kligerman S, Hsiao A. Deep Learning Localization of Pneumonia. J Thorac Imaging. 2020;1. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Liao R, Rubin J, Lam G, et al. Semi-supervised Learning for Quantification of Pulmonary Edema in Chest X-Ray Images. 2019. http://arxiv.org/abs/1902.10785. Accessed January 4, 2020.
25.Kennedy S, Simon B, Alter HJ, Cheung P. Ability of Physicians to Diagnose Congestive Heart Failure Based on Chest X-Ray. J Emerg Med. 2011;40(1):47–52. [DOI] [PubMed] [Google Scholar]
26.Hammon M, Dankerl P, Voit-Höhne HL, et al. Improving diagnostic accuracy in assessing pulmonary edema on bedside chest radiographs using a standardized scoring approach. BMC Anesthesiol. 2014;14:94. Accessed January 4, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Sabottke CF, Spieler BM. The Effect of Image Resolution on Deep Learning in Radiography. Radiol Artif Intell. 2020;2(1):e190015. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Mason D. SU-E-T-33: Pydicom: An Open Source DICOM Library. Med Phys. 2011;38(6Part10):3493–3493. [Google Scholar]
29.The OpenCV Library | Dr Dobb’s. https://www.drdobbs.com/open-source/the-opencv-library/184404319. Accessed April 12, 2020.
30.Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. 2014. http://arxiv.org/abs/1412.6980. Accessed June 16, 2019.

[r1] 1.ACR Recommendations for the use of Chest Radiography and Computed Tomography (CT) for Suspected COVID-19 Infection | American College of Radiology. https://www.acr.org/Advocacy-and-Economics/ACR-Position-Statements/Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection. Accessed March 27, 2020.

[r2] 2.Rubin GD, Haramati LB, Kanne JP, et al. The Role of Chest Imaging in Patient Management during the COVID-19 Pandemic: A Multinational Consensus Statement from the Fleischner Society. Radiology. 2020;201365. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3] 3.Wong HYF, Lam HYS, Fong AH-T, et al. Frequency and Distribution of Chest Radiographic Findings in COVID-19 Positive Patients. Radiology. 2019;201160. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4] 4.Albaum MN, Hill LC, Murphy M, et al. Interobserver reliability of the chest radiograph in community-acquired pneumonia. Chest. 1996;110(2):343–350. [DOI] [PubMed] [Google Scholar]

[r5] 5.Loeb MB, Carusone SBC, Marrie TJ, et al. Interobserver Reliability of Radiologists’ Interpretations of Mobile Chest Radiographs for Nursing Home-Acquired Pneumonia. J Am Med Dir Assoc. 2006;7(7):416–419. [DOI] [PubMed] [Google Scholar]

[r6] 6.Neuman MI, Lee EY, Bixby S, et al. Variability in the interpretation of chest radiographs for the diagnosis of pneumonia in children. J Hosp Med. 2012;7(4):294–298. [DOI] [PubMed] [Google Scholar]

[r7] 7.Li MD, Chang K, Bearce B, et al. Siamese neural networks for continuous disease severity evaluation and change detection in medical imaging. NPJ Digit Med. 2020;3(1):48. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r8] 8.Bromley J, Bentz Jw, Bottou L, et al. Signature Verification Using a “Siamese” Time Delay Neural Network. Int J Pattern Recognit Artif Intell. 1993;7(4):669–688. [Google Scholar]

[r9] 9.Hadsell R, Chopra S, LeCun Y. Dimensionality Reduction by Learning an Invariant Mapping. 2006 IEEE Comput Soc Conf Comput Vis Pattern Recognit - Vol 2. IEEE; 1735–1742. http://ieeexplore.ieee.org/document/1640964/. Accessed June 9, 2019.

[r10] 10.Irvin J, Rajpurkar P, Ko M, et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. 2019. http://arxiv.org/abs/1901.07031. Accessed January 4, 2020.

[r11] 11.Warren MA, Zhao Z, Koyama T, et al. Severity scoring of lung oedema on the chest radiograph is associated with clinical outcomes in ARDS. Thorax. 2018;73(9):840–846. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12] 12.Zhou F, Yu T, Du R, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395(10229):1054–1062. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13] 13.Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc.; 2016;2017-January:2261–2269. http://arxiv.org/abs/1608.06993. Accessed March 29, 2020.

[r14] 14.Zeiler MD, Fergus R. Visualizing and Understanding Convolutional Networks. 2013. http://arxiv.org/abs/1311.2901. Accessed December 7, 2019.

[r15] 15.Phua J, Weng L, Ling L, et al. Intensive care management of coronavirus disease 2019 (COVID-19): challenges and recommendations. Lancet Respir Med. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16] 16.Wynants L, Van Calster B, Bonten MMJ, et al. Prediction models for diagnosis and prognosis of covid-19 infection: Systematic review and critical appraisal. BMJ. 2020;369. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r17] 17.Taylor E, Haven K, Reed P, et al. A chest radiograph scoring system in patients with severe acute respiratory infection: A validation study. BMC Med Imaging. 2015;15(1). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r18] 18.Sheshadri A, Shah DP, Godoy M, et al. Progression of the Radiologic Severity Index predicts mortality in patients with parainfluenza virus-associated lower respiratory infections. PLoS One. 2018;13(5):e0197418. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19] 19.McClain L, Hall M, Shah SS, et al. Admission chest radiographs predict illness severity for children hospitalized with pneumonia. J Hosp Med. 2014;9(9):559–564. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] 20.Toussie D, Voutsinas N, Finkelstein M, et al. Clinical and Chest Radiography Features Determine Patient Outcomes In Young and Middle Age Adults with COVID-19. Radiology. 2020;201754. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21] 21.Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. 2017. http://arxiv.org/abs/1711.05225. Accessed January 4, 2020.

[r22] 22.Lakhani P, Sundaram B. Deep learning at chest radiography: Automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. 2017;284(2):574–582. [DOI] [PubMed] [Google Scholar]

[r23] 23.Hurt B, Kligerman S, Hsiao A. Deep Learning Localization of Pneumonia. J Thorac Imaging. 2020;1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r24] 24.Liao R, Rubin J, Lam G, et al. Semi-supervised Learning for Quantification of Pulmonary Edema in Chest X-Ray Images. 2019. http://arxiv.org/abs/1902.10785. Accessed January 4, 2020.

[r25] 25.Kennedy S, Simon B, Alter HJ, Cheung P. Ability of Physicians to Diagnose Congestive Heart Failure Based on Chest X-Ray. J Emerg Med. 2011;40(1):47–52. [DOI] [PubMed] [Google Scholar]

[r26] 26.Hammon M, Dankerl P, Voit-Höhne HL, et al. Improving diagnostic accuracy in assessing pulmonary edema on bedside chest radiographs using a standardized scoring approach. BMC Anesthesiol. 2014;14:94. Accessed January 4, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r27] 27.Sabottke CF, Spieler BM. The Effect of Image Resolution on Deep Learning in Radiography. Radiol Artif Intell. 2020;2(1):e190015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28] 28.Mason D. SU-E-T-33: Pydicom: An Open Source DICOM Library. Med Phys. 2011;38(6Part10):3493–3493. [Google Scholar]

[r29] 29.The OpenCV Library | Dr Dobb’s. https://www.drdobbs.com/open-source/the-opencv-library/184404319. Accessed April 12, 2020.

[r30] 30.Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. 2014. http://arxiv.org/abs/1412.6980. Accessed June 16, 2019.

PERMALINK

Automated Assessment and Tracking of COVID-19 Pulmonary Disease Severity on Chest Radiographs using Convolutional Siamese Neural Networks

Matthew D Li, MD

Nishanth Thumbavanam Arun

Mishka Gidwani, BS

Ken Chang, MSE

Francis Deng, MD

Brent P Little, MD

Dexter P Mendoza, MD

Min Lang, MD, MSc

Susanna I Lee, MD, PhD

Aileen O’Shea, MD

Anushri Parakh, MD

Praveer Singh, PhD

Jayashree Kalpathy-Cramer, PhD

Abstract

Purpose

Materials and Methods

Results

Conclusion

Summary

Key Points

Introduction

Materials and Methods

Chest Radiograph Data

Figure 1:

Radiologist Scoring of Pulmonary Disease Severity on Chest Radiographs

Radiologist Assessment of Longitudinal Change

Convolutional Siamese Neural Network Training

Calculating the Pulmonary X-Ray Severity (PXS) Score

Occlusion sensitivity maps for visualizing Siamese neural network outputs

Statistical Analysis

Results

COVID-19 Data Set Characteristics

Table 1.

Table 2.

mRALE Score Inter-Rater Correlation

Siamese Neural Network-based PXS Score Correlates with mRALE score

Figure 2:

Table 3.

Longitudinal Change Assessment with the PXS Score

Figure 3:

Association Between PXS Score and Intubation or Death

Figure 4:

Table 4.

Discussion

Acknowledgments

Acknowledgments

Abbreviations:

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases