Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2020 Sep 18:2020.09.15.20195453. [Version 1] doi: 10.1101/2020.09.15.20195453

Improvement and Multi-Population Generalizability of a Deep Learning-Based Chest Radiograph Severity Score for COVID-19

Matthew D Li, Nishanth T Arun, Mehak Aggarwal, Sharut Gupta, Praveer Singh, Brent P Little, Dexter P Mendoza, Gustavo CA Corradi, Marcelo S Takahashi, Suely F Ferraciolli, Marc D Succi, Min Lang, Bernardo C Bizzo, Ittai Dayan, Felipe C Kitamura, Jayashree Kalpathy-Cramer
PMCID: PMC7523150  PMID: 32995811

Abstract

Purpose: To improve and test the generalizability of a deep learning-based model for assessment of COVID-19 lung disease severity on chest radiographs (CXRs) from different patient populations. Materials and Methods: A published convolutional Siamese neural network-based model previously trained on hospitalized patients with COVID-19 was tuned using 250 outpatient CXRs. This model produces a quantitative measure of COVID-19 lung disease severity (pulmonary x-ray severity (PXS) score). The model was evaluated on CXRs from four test sets, including 3 from the United States (patients hospitalized at an academic medical center (N=154), patients hospitalized at a community hospital (N=113), and outpatients (N=108)) and 1 from Brazil (patients at an academic medical center emergency department (N=303)). Radiologists from both countries independently assigned reference standard CXR severity scores, which were correlated with the PXS scores as a measure of model performance (Pearson r). The Uniform Manifold Approximation and Projection (UMAP) technique was used to visualize the neural network results. Results: Tuning the deep learning model with outpatient data improved model performance in two United States hospitalized patient datasets (r=0.88 and r=0.90, compared to baseline r=0.86). Model performance was similar, though slightly lower, when tested on the United States outpatient and Brazil emergency department datasets (r=0.86 and r=0.85, respectively). UMAP showed that the model learned disease severity information that generalized across test sets. Conclusions: Performance of a deep learning-based model that extracts a COVID-19 severity score on CXRs improved using training data from a different patient cohort (outpatient versus hospitalized) and generalized across multiple populations.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.


Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES