MultiCOVID: a multi modal deep learning approach for COVID-19 diagnosis

Max Hardy-Werbin; José Maria Maiques; Marcos Busto; Isabel Cirera; Alfons Aguirre; Nieves Garcia-Gisbert; Flavio Zuccarino; Santiago Carbullanca; Luis Alexander Del Carpio; Didac Ramal; Ángel Gayete; Jordi Martínez-Roldan; Albert Marquez-Colome; Beatriz Bellosillo; Joan Gibert

doi:10.1038/s41598-023-46126-8

. 2023 Oct 31;13:18761. doi: 10.1038/s41598-023-46126-8

MultiCOVID: a multi modal deep learning approach for COVID-19 diagnosis

Max Hardy-Werbin ^1,³, José Maria Maiques ², Marcos Busto ², Isabel Cirera ³, Alfons Aguirre ³, Nieves Garcia-Gisbert ¹, Flavio Zuccarino ², Santiago Carbullanca ², Luis Alexander Del Carpio ², Didac Ramal ², Ángel Gayete ², Jordi Martínez-Roldan ⁴, Albert Marquez-Colome ⁵, Beatriz Bellosillo ^1,⁶, Joan Gibert ^1,^6,^✉

PMCID: PMC10618492 PMID: 37907750

Abstract

The rapid spread of the severe acute respiratory syndrome coronavirus 2 led to a global overextension of healthcare. Both Chest X-rays (CXR) and blood test have been demonstrated to have predictive value on Coronavirus Disease 2019 (COVID-19) diagnosis on different prevalence scenarios. With the objective of improving and accelerating the diagnosis of COVID-19, a multi modal prediction algorithm (MultiCOVID) based on CXR and blood test was developed, to discriminate between COVID-19, Heart Failure and Non-COVID Pneumonia and healthy (Control) patients. This retrospective single-center study includes CXR and blood test obtained between January 2017 and May 2020. Multi modal prediction models were generated using opensource DL algorithms. Performance of the MultiCOVID algorithm was compared with interpretations from five experienced thoracic radiologists on 300 random test images using the McNemar–Bowker test. A total of 8578 samples from 6123 patients (mean age 66 ± 18 years of standard deviation, 3523 men) were evaluated across datasets. For the entire test set, the overall accuracy of MultiCOVID was 84%, with a mean AUC of 0.92 (0.89–0.94). For 300 random test images, overall accuracy of MultiCOVID was significantly higher (69.6%) compared with individual radiologists (range, 43.7–58.7%) and the consensus of all five radiologists (59.3%, P < .001). Overall, we have developed a multimodal deep learning algorithm, MultiCOVID, that discriminates among COVID-19, heart failure, non-COVID pneumonia and healthy patients using both CXR and blood test with a significantly better performance than experienced thoracic radiologists.

Subject terms: Computational biology and bioinformatics, Biomarkers, Diseases, Health care

Introduction

The outbreak of Coronavirus Disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), stroke the worldwide population with more than 200 million cases and 4.5 million deaths by August 2021. The rapid spread of the pandemic led to a global overexertion of health care and research facilities in order to counteract the growing rate of infection. However, a collapse of the sanitary system was imminent and inevitable worldwide, and new technologies were needed to speed up the diagnostic process.

The reference for COVID-19 diagnosis is the detection of SARS-CoV-2 viral RNA by real-time polymerase chain reaction (RT-PCR). However, the massive requests for sample processing at the beginning of the pandemic caused serious delays to obtain results.

As lung involvement is one of the main causes of morbidity and mortality in SARS-CoV-2 infection, a quick identification of characteristic findings in chest imaging can support the diagnosis and speed up the identification of COVID-19 positive patients at the emergency units.

Several studies have shown that implementation of deep learning (DL) tools to detect chest X-rays (CXR) findings typically associated with SARS-CoV-2 infection, deliver comparable results to those acquired by interpretation of radiologists. However, most of the trained models have a drop in their prediction performance when tested over external datasets¹. In addition, one of the main hurdles to overcome when training an algorithm to detect Sars-CoV-2 infection in CXR is the similarity of findings with other entities like bacterial pneumonias or heart failure². On the other hand, models based on laboratory results of peripheral blood also give predictive results on diagnosis³ and prognosis⁴.

A key fact to highlight is how the incursion of COVID-19 caused a dramatic drop in the emergency room consultations of other pathologies. Later on, after the initial peak, the decline of the COVID-19 prevalence made the non-COVID diseases emerge once again at the hospitals. This is relevant due to the challenge of performing an efficient differential diagnosis with selected pathologies during a pandemic. It is well known that the predictive value of a diagnostic test is conditioned by the prevalence of the disease and that of COVID varies widely throughout the different waves of the pandemic⁵. A multicategory approach that takes into account differential diagnoses that are more stable in their prevalence could reduce this variability.

With the objective of improving and accelerating the diagnosis of COVID-19, we developed a tool to assist physicians in reaching a diagnosis. This tool is a multi-modal prediction algorithm (MultiCOVID) based on CXR and blood test with the ability to discriminate between COVID-19, Heart Failure (HF), Non-COVID Pneumonia (NCP) and healthy (Control) samples.

Materials and methods

Dataset

We retrospectively collected CXR images and hemogram values from 8578 samples from 6123 patients and healthy subjects (mean age 66 ± 18 years of standard deviation, 3523 men) from Parc Salut Mar (PSMAR) Consortium, Barcelona, Spain. Four cohorts were designed: (i) 1171 samples from patients diagnosed with COVID-19 by RT-PCR from March to May 2020; (ii) 1008 samples of patients who suffered an episode of heart failure between 2012 to 2019; (iii) 490 samples of patients diagnosed with non-COVID pneumonia (NCP) from 2018 to 2019; (iv) 5909 samples of standard preoperatory studies of healthy subjects from 2017 to 2019 (Fig. 1). HR and NCP diagnosis were selected as defined by the International Classification of Diseases, Tenth Revision (ICD-10) code. All the CXR images from groups i-iii were validated by two independent radiologists (MB and JM).

Acquisition of blood sample and image data

We included CXR images performed in a period ranging from 1 day before the patient’s diagnosis to 7 days after. The images were filtered to include only frontal projections regardless of the quality and the radiography system used. Blood sample results were collected within a range of 2 days before or 7 days after the CXR acquisition date using PSMAR lab record system, except for control samples whose measurements ranged for 2 weeks. If two or more blood test results were collected, measurements were averaged.

CXR images and blood test results were combined in the same dataset and split into train/validation set (90%), and hold-out test (10%) set. For training/validation split, we divided the dataset in training (80%) and validation (20%) sets with 5 different random seeds. We ensured that there were no cross-over patients between groups.

Deep learning models

Detailed description of the models, training policy and image preprocessing are provided in Supplementary Material. In brief, segmentation model is based on a U-Net architecture⁶. The CXR-only classification model consists of a validated Convolutional neural network (CNN) resnet-34 architecture⁷. Tabular only-model is an Attention-based network (TabNet)⁸. Joint model is a multi-modal deep learning algorithm which merges the CXR-only and the Blood-only models and uses both CXR image and blood tests as input values. It uses Gradient Blending in order to prevent overfitting and improve generalization⁹. MultiCOVID model is an ensemble predictor of 5 different Joint models that would classify independently between the different classes. Then it uses majority vote to assign a final classification. The whole pipeline development and training was performed using fastai deep learning API¹⁰.

Comparison with thoracic radiologist interpretations

Hold-out test dataset consisting of 300 samples (ensuring no patient overlap with training or validation sets) was used for expert interpretation. Each sample consisted of a CXR with matched blood results. Expert interpretations were independently provided by five board-certified thoracic radiologists (FZ, SC, LdC, DR, AG) with 2–30 years post-residency training experience. Radiologists were able to check both non segmented images and blood test results without any other additional information in a platform created ad-hoc for prediction. They provided a classification for each image in one of the four categories (COVID-19, control, HF and NCP). A consensus interpretation for the radiologist was obtained by the majority vote for each paired CHX-blood test analyzed.

Statistical analysis

A two-tailed t-test P value was reported when clinical and population blood test differences were assessed. McNemar–Bowker test was used to compare model performance against radiologist majority vote using FDR correction. Plotting and statistical analyses were performed using the packages ggplot, ggpubr and rcompanion in R, version 3.6 (R Core Team; R Foundation for Statistical Computing).

Ethical approval

The study was designed to use radiology images and associated clinical/demographic/ laboratory patient information already collected for the purpose of performing clinical COVID-19 research by Hospital del Mar. The study was conducted in accordance with the relevant institutional guidelines and regulations. The experimental protocols, data acquisition and analysis were approved by the Parc de Salut Mar Clinical Research Ethics Committee (2020/9199/I). Informed consent was obtained, when possible, from patients or legal representatives or waived by the local Parc de Salut Mar Clinical Research Ethics Committee (2020/9199/I) if informed consent was not available due to the pandemic situation.

Results

Patient characteristics

A total of 8578 samples were evaluated across datasets. Patient characteristics and blood test parameters are shown in Table 1. A highly significant difference in age was found between the cohort of patients with heart failure (82.8 ± 10 years) and the other three cohorts (66.0 ± 16 years for COVID-19 samples, 63.2 ± 18 years for control samples and 67.8 ± 17 years for NCP samples, P < 0.001 for each comparison) and was not considered as a valid variable for further classification.

Table 1.

Patient characteristics.

Characteristic

Overall

COVID-19

Control

NCP

8578

1171

5909

1008

490

Age, mean (SD)

[Years]

66.128

(18.353)

66.013

(16.612)

63.165

(18.281)

82.823

(10.731)

67.786

(17.011)

Sex, n (%)

5023

(58.557)

677

(57.814)

3598

(60.890)

460

(45.635)

288

(58.776)

3555

(41.443)

494

(42.186)

2311

(39.110)

548

(54.365)

202

(41.224)

% basophils

[%]

0.300

[0.200,0.525]

0.200

[0.100,0.300]

0.400

[0.200,0.600]

0.333

[0.200,0.500]

0.300

[0.175,0.500]

Total basophils

[× 10³/µL]

0.030 [0.020,0.050]

0.010

[0.010,0.020]

0.040

[0.020,0.055]

0.030

[0.020,0.050]

0.030

[0.017,0.060]

% eosinophils

[%]

0.700

[0.100,1.900]

0.000

[0.000,0.300]

0.950

[0.175,2.150]

0.900

[0.200,2.100]

0.600

[0.000,2.200]

Total eosinophils

[× 10³/µL]

0.060

[0.010,0.160]

0.000

[0.000,0.020]

0.080

[0.020,0.180]

0.070

[0.020,0.160]

0.060

[0.005,0.200]

MCH

[pg]

29.650

[28.300,30.900]

29.400

[28.300,30.550]

29.800

[28.433,31.050]

29.200

[27.387,30.700]

29.600

[28.500,30.700]

Hematrocrit

[%]

38.000

[33.100,42.250]

40.000

[36.600,43.300]

38.500

[33.600,42.700]

35.200

[31.400,39.200]

32.025

[28.512,36.100]

Red Blood Cells

[× 10⁶/µL]

4.290

[3.710,4.809]

4.550

[4.080,4.935]

4.360

[3.743,4.860]

3.930

[3.494,4.370]

3.630

[3.160,4.070]

Hemoglobin

[g/dL]

12.600

[10.800,14.150]

13.300

[12.000,14.433]

12.900

[11.067,14.400]

11.300

[10.000,12.700]

10.500

[9.200,12.100]

Leukocytes

[×10³ /µL]

9.020

[6.771,12.100]

6.420

[5.060,8.860]

9.450

[7.260,12.655]

8.660

[6.894,11.072]

10.950

[7.700,14.781]

% lymphocytes

[%]

15.350

[8.333,25.100]

14.900

[9.475,21.900]

16.800

[8.400,27.600]

12.700

[8.200,18.912]

10.100

[5.763,17.850]

Total lymphocytes

[× 10³/µL]

1.310

[0.800,2.010]

0.930

[0.688,1.270]

1.525

[0.920,2.240]

1.070

[0.740,1.560]

1.140

[0.605,1.680]

MCHC

[g/dL]

33.233

[32.200,34.167]

33.000

[32.100,33.900]

33.467

[32.500,34.400]

32.300

[31.300,33.200]

32.717

[31.600,33.750]

% monocytes

[%]

7.100

[5.300,8.900]

6.500

[4.400,9.000]

7.100

[5.400,8.800]

8.000

[6.237,9.812]

6.300

[4.308,8.600]

Total monocytes

[× 10³/µL]

0.635

[0.442,0.840]

0.415

[0.290,0.600]

0.660

[0.480,0.870]

0.690

[0.510,0.890]

0.690

[0.440,0.890]

% neutrophils

[%]

75.200

[63.600,84.400]

77.000

[69.100,84.600]

73.467 [60.900,84.100]

76.800 [69.183,82.900]

81.050 [70.925,89.100]

Total neutrophils

[× 10³/µL]

6.450 [4.400,9.650]

4.870

[3.470,7.095]

6.673

[4.520,10.080]

6.470

[4.950,8.654]

8.418

[5.500,12.360]

P-LCR

[%]

30.700

[25.167,36.900]

30.800

[25.300,36.300]

30.317

[24.950,36.500]

32.858

[26.992,38.400]

32.300

[24.000,40.413]

PDW

[fL]

12.600

[11.100,14.400]

12.600

[11.075,14.200]

12.500

[11.100,14.300]

13.000

[11.400,14.900]

12.850

[10.800,15.400]

Platelets

[× 10^3/µL]

222.333

[173.000,281.000]

201.000

[157.500,267.000]

227.000

[178.000,280.333]

215.500

[168.000,269.000]

229.500

[161.250,373.625]

RDW-CV

[%]

13.900

[13.050,15.300]

13.300

[12.600,14.100]

13.750

[13.000,15.100]

15.325

[14.350,17.100]

14.800

[13.813,16.275]

RDW-SD

[fL]

45.100

[41.700,49.650]

43.200

[40.600,46.225]

44.600

[41.337,48.850]

49.833

[46.300,54.975]

48.100

[45.100,52.425]

MCV

[fL]

89.000

[85.221,92.700]

88.800

[85.700,92.300]

88.800

[85.000,92.400]

89.950

[85.300,94.263]

90.250

[86.806,93.681]

MPV

[fL]

10.700

[10.000,11.500]

10.700

[10.100,11.450]

10.667

[10.000,11.400]

11.000

[10.300,11.700]

10.900

[9.950,11.950]

Open in a new tab

COVID-19, coronavirus disease 2019; HF, heart failure; NCP, non-COVID pneumonia; MHC, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration; P-LCR, platelet-large cell ratio; PDW, platelet distribution width; RDW, red cell distribution width; MCV, Mean corpuscular volume; MPV, mean platelet volume.

Whole CXR models learn spurious characteristics for classification

Previous studies have demonstrated that deep learning (DL)-based algorithms should be rigorously evaluated due to their ability to learn non relevant features in order to increase its prediction accuracy¹. For this reason, we first developed a segmentation algorithm able to segment lung parenchyma at a 95%-pixel accuracy. Then, after segmentation, we evaluated the accuracy of the algorithms for three complementary datasets: non-segmented images, segmented regions and excluded regions. After a few training epochs the three different models achieved nonrandom accuracies between 67 and 74% (Fig. 2A). However, attention map exploration on the images showed that the different models based their predictions not only inside but also outside of the lung parenchyma (Fig. 2B).

Performance of visual models on whole CXR images. (A) Confusion matrix and overall accuracy using whole image, segmented and inverse segmented images, respectively for each category tested. (B) Raw image and Grad-CAM heatmap representation of an image for each category and model trained.

These observations showed that, although there are important features outside the lung parenchyma that may help the model to classify between the different entities (eg. heart size), there are other elements (eg. oxygen nasal cannulas or intravenous (IV) catheters) that might confound the model. Thus, we decided to first segment all the CXR before training our models for prediction of diagnosis. In order to accomplish this task, we generated a 785-radiology level lung segmentation dataset and trained a U-net model to regenerate the whole CXR dataset keeping only the lung parenchyma.

Performance of single and multimodal models

In order to evaluate the prediction capacity of both segmented CXR and blood sample data, we built different DL models using both sources alone or in combination. Metrics comparison of all the single vision (CXR-only) and tabular (Blood-only) models are detailed in Supplementary Material. As expected, CXR-only models had a more robust prediction of all 4 categories tested compared to Blood-only models (Fig. 3). This difference is stronger in the classes with less samples (HF, and NCP) where CXR-only models could identify features in the CXR images which are characteristic of these two entities whereas this was not possible with Blood-only models.

Performance of different models on the entries from hold-out test datasets. Means for precision (green), sensitivity (blue), F1 score (yellow), AUC (red) and accuracy (black diamond) for each model type and category assessed, respectively. CXR-only models use only CXR images for 4 category classification. Blood-only models use blood test a source of information. Joint model uses both CXR and blood test as input for classification and MultiCOVID is the majority vote of 5 different Joint models.

Model interpretability of Blood-only models by analyzing feature importance using Shapley Additive explanations¹² showed that patient classification was related to two different axes: the immune compartment and the red blood cell (RBC) compartment, respectively (Fig. 4A). The first axis seems to be strongly associated with COVID-19 classification and shows a specific signature looking at the blood counts (Fig. 4B-top). However, the second axis seems to subdivide patients between COVID-19/Control and HF/NCP, although COVID-19 blood counts seems to be statistically different from Control samples, too (Fig. 4B-bottom).

Blood-only model interpretability by SHAP analysis. (A) Summary plot showing the mean absolute SHAP value of the ten most important features for the four classes. (B) Blood test values of the different features identified by SHAP analysis. RDW-CV: red cell distribution width; MCHC: Mean Corpuscular Hemoglobin Concentration; RBC: red blood cells.

The combination of CXR and blood tests using multimodal models that combine inputs from tabular and image data to perform a global prediction, slightly increased the prediction capacity of the single models even when DL tabular models are worse than machine learning (ML—XGBoost) models alone (Supplementary Table 1). This underpins the concept that adding new sources of information to the data could increase the ability of the models to generate better predictions ¹³. Moreover, the joint approach used for building MultiCOVID algorithm resulted on an improved performance in the majority of the metrics analyzed (Fig. 3 and Supplementary Table 1).

Comparison with expert thoracic radiologists

Finally, we compared the performance of MultiCOVID algorithm with the interpretation of expert chest radiologists. This comparison was performed with 300 CXR randomly selected from the hold-out test set that were independently reviewed by 5 radiologists together with the blood test results. The independent results from radiologists showed an accuracy ranging from 43.7 to 58.7%. This value rose to 59.3% (178/300) when the consensus interpretation of all 5 radiologists based on the majority vote was considered. Of note, the overall accuracy achieved by MultiCOVID was 69.6% (209/300) that was significantly higher than consensus interpretation (P < 0.001). In addition, for COVID-19 prediction individually, MultiCOVID showed similar sensitivity to the radiologists’ consensus but with a much higher specificity, leading to significantly better performance when discerning between COVID-19 versus Control and COVID-19 vs HF patients (P < 0.05 for both comparisons; Fig. 5).

Comparison of the performance of MultiCOVID model with consensus expert radiologist interpretations on random sample of 300 images from the test set. The receiver operating characteristic (ROC) curves for each category (COVID-19 – blue; Control – green; Heart Failure (HF) – red and Non-COVID Pneumonia (NCP) – magenta) are shown for MultiCOVID (DL) and for the consensus interpretation of radiologists (majority vote). Sensitivity (Sens) and specificity (Spec) are also plotted for each category assessed. DL: deep learning.

Discussion

Diagnosis of COVID-19 is an evolving challenge. During the beginning of the pandemic and the successive peaks with high prevalence rates, a prompt and effective diagnosis was critical for proper patient isolation and evaluation. However, since the prevalence of the COVID-19 cases oscillated, showing fewer cases between waves, and more non-COVID cases, it was important to differentiate patients with other diseases than COVID-19 presenting similar visual characteristics in the CXR.

During patient assessment in the emergency room, clinicians take into account different inputs for a proper diagnosis. First, the anamnesis, symptoms, vitals and physical findings guide the physician to an initial assumption. Based on this information, additional tests are requested (CXR, blood test, ECG and SARS-CoV-2 detection). The integration of these results allows the team to diagnose a patient accurately. However, this process is time consuming and sometimes findings are difficult to interpret, leading to misdiagnosis.

To improve this diagnostic process, we have developed and trained a multimodal deep learning algorithm based in a multiple input approach combining CXR images together with blood sample data to identify COVID-19 diagnosis with high sensitivity. This way we were able to manage the increased complexity of the dataset. These data from multiple sources are somehow correlated and complementary to each other and could reflect patterns that are not present in single models alone¹³.

Hence, MultiCOVID is fed by two of the most common and fast clinical tests requested in the emergency room (CXR and Blood test) and can predict the presence of three different diseases (COVID-19, heart failure and non-COVID pneumonia) with similar CXR characteristics.

Analysis of single models shows the importance of model interpretation. While CXR-only models could identify patterns outside the lung parenchyma that could diminish its generalization capacity⁹, Blood-only models could point to interesting population of cells that are differently represented in COVID-19 patients, leveraging its prediction capacity. In this context, the immune compartment plays an important role in the COVID-19 response, and it has been already published that COVID-19 patients present fewer overall leukocytes counts and, more concretely, eosinophil counts^{14, 15}. Furthermore, oxygen transport seems to be somehow affected, modulating the red cell population. In this regard, in our work we found significant differences in the erythrocyte count and the hemoglobin concentration. Although most of the studies correlate the reduction of this values to severe COVID-19 patients¹⁶, this is the first dataset to compare them in these four different categories at the time of diagnosis.

Moreover, although a huge amount of literature about COVID-19 diagnosis and prognosis has been published using only blood tests^17–20 or CXR^21–28 this is the first study that combines both parameters and compares its prediction capacity at diagnosis. Of note, only one previously published study integrates both blood test and CXR severity scores in order to determine in-hospital death of COVID-19 patients²⁹. Hence, it is clear that merging both sources of data leads to a better prediction performance when compared with the two single models alone and that this difference is more pronounced where the number of cases is scarce. It is important to stress that this combination of data sources addresses the variable prevalence of COVID-19 cases during the pandemic, which is an issue that could not be solved in previous studies^{23, 24}.

Our study has several limitations. First, the algorithm was evaluated on a single center; thus, there was likely some degree of bias. Additionally, the sample collection was performed in different time periods for each group of patients, which could present some kind of differences in the CXR image acquisition although this was partially solved using the lung segmentation model which removes the noise signal present outside the lung parenchyma. And finally, model performance could be influenced by potential shifts in the disease landscape due to COVID-19 variants and vaccination efforts, which could influence the generalizability and interpretation of our findings.

Conclusions

We have developed a multimodal deep learning algorithm, MultiCOVID, that discriminates among COVID-19, heart failure, non-COVID pneumonia and healthy patients using both CXR and blood test with a significantly better performance than experienced thoracic radiologists.

Our approach and results suggest an innovative scenario where COVID-19 prediction could be identified from other similar diseases and facilitate triage within the emergency room in a COVID-19 low prevalence situation.

Supplementary Information

Supplementary Information 1.^{(20.1KB, docx)}

Supplementary Table 1.^{(15.8KB, docx)}

Abbreviations

DL: Deep learning
CXR: Chest X-rays
AUC: Area under the receiver operating characteristic curve
COVID-19: Coronavirus disease 2019
RT-PCR: Reverse-transcription polymerase chain reaction
SARS-CoV-2: Severe acute respiratory syndrome coronavirus 2
HF: Heart failure
NCP: Non-COVID pneumonia

Author contributions

M.H.-W.: Data curation , Validation, Formal analysis, Investigation, Project administration, Supervision, Roles/Writing—original draft, Writing—review & editing; J.M.M.: Data curation , Formal analysis, Investigation, Validation, Roles/Writing—original draft, Writing—review & editing; M.B.: Data curation, Formal analysis, Investigation Validation, Roles/Writing—original draft, Writing—review & editing; I.C.: Data curation, Validation, Roles/Writing—original draft, Writing—review & editing; A.A.: Data curation, Validation, Roles/Writing—original draft, Writing—review & editing; N.G.-G.: Investigation, Visualization, Project administration, Roles/Writing—original draft, Writing—review & editin; F.Z.: Validation, Roles/Writing—original draft, Writing—review & editing; S.C.: Validation, Roles/Writing—original draft, Writing—review & editing, L.A.D.C.: Validation, Roles/Writing—original draft, Writing—review & editing, D.R.: Validation, Roles/Writing—original draft, Writing—review & editing; Á.G.: Validation, Roles/Writing—original draft, Writing—review & editin; J.M.-R.: Project administration, Supervision, Roles/Writing—original draft, Writing—review & editing; A.M.-C.: Data curation, Project administration, Supervision, Roles/Writing—original draft, Writing—review & editing; B.B.: Data curation, Formal analysis, Investigation, Project administration, Supervision, Roles/Writing—original draft, Writing—review & editin; J.G.: Data curation, Formal analysis; Investigation, Visualization, Project administration, Supervision, Roles/Writing—original draft, Writing—review & editing.

Data availability

Our code base is provided on GitHub at https://github.com/Tato14/MultiCOVID, including weights for each of the individually trained neural network architectures and respective model weights for the weighted ensemble model. The datasets used and analyzed during the current study will be available from the corresponding author on reasonable request. In order to correct samples bias¹¹, additional metadata information present in the DICOM image headers from the CXR would be also available upon request.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-023-46126-8.

References

1.DeGrave AJ, Janizek JD, Lee S-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 2021;3:610–619. doi: 10.1038/s42256-021-00338-7. [DOI] [Google Scholar]
2.Cleverley J, Piper J, Jones MM. The role of chest radiography in confirming covid-19 pneumonia. BMJ. 2020 doi: 10.1136/bmj.m2426. [DOI] [PubMed] [Google Scholar]
3.Avila E, Kahmann A, Alho C, Dorn M. Hemogram data as a tool for decision-making in COVID-19 management: Applications to resource scarcity scenarios. PeerJ. 2020;8:e9482. doi: 10.7717/peerj.9482. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Razavian N, et al. A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients. npj Digit. Med. 2020;3:130. doi: 10.1038/s41746-020-00343-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Trevethan R. Sensitivity, specificity, and predictive values: Foundations, pliabilities, and pitfalls in research and practice. Front. Public Heal. 2017;5:307. doi: 10.3389/fpubh.2017.00307. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation 234–241 (2015). 10.1007/978-3-319-24574-4_28.
7.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016). 10.1109/CVPR.2016.90.
8.Arik SÖ, Pfister T. TabNet: Attentive interpretable tabular learning. Proc. AAAI Conf. Artif. Intell. 2021;35(8):6679–6687. [Google Scholar]
9.Wang, W., Tran, D. & Feiszli, M. What makes training multi-modal classification networks hard?. In Proceedings / IEEE Computer Society Conference on Computer Vision and Pattern Recognition 12692–12702 (2019) 10.1109/CVPR42600.2020.01271.
10.Howard J, Gugger S. Fastai: A layered API for deep learning. Information. 2020;11:108. doi: 10.3390/info11020108. [DOI] [Google Scholar]
11.Garcia Santa Cruz B, Bossa MN, Sölter J, Husch AD. Public Covid-19 X-ray datasets and their impact on model bias—A systematic review of a significant problem. Med. Image Anal. 2021;74:102225. doi: 10.1016/j.media.2021.102225. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Lundberg SM, et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020;2:56–67. doi: 10.1038/s42256-019-0138-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Ngiam, J. et al. Multimodal deep learning. ICML (2011).
14.Tan Y, Zhou J, Zhou Q, Hu L, Long Y. Role of eosinophils in the diagnosis and prognostic evaluation of COVID-19. J. Med. Virol. 2021;93:1105–1110. doi: 10.1002/jmv.26506. [DOI] [PubMed] [Google Scholar]
15.Rahman A, et al. Hematological abnormalities in COVID-19: A narrative review. Am. J. Trop. Med. Hyg. 2021;104:1188–1201. doi: 10.4269/ajtmh.20-1536. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Lippi G, Mattiuzzi C. Hemoglobin value may be decreased in patients with severe coronavirus disease 2019. Hematol. Transfus. Cell Ther. 2020;42:116–117. doi: 10.1016/j.htct.2020.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Kukar M, et al. COVID-19 diagnosis by routine blood tests using machine learning. Sci. Rep. 2021;11:10738. doi: 10.1038/s41598-021-90265-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Bayat V, et al. A Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) prediction model from standard laboratory tests. Clin. Infect. Dis. 2021;73:e2901–e2907. doi: 10.1093/cid/ciaa1175. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Soltan AAS, et al. Rapid triage for COVID-19 using routine clinical data for patients attending hospital: Development and prospective validation of an artificial intelligence screening test. Lancet Digit. Heal. 2021;3:e78–e87. doi: 10.1016/S2589-7500(20)30274-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Chen J, et al. Distinguishing between COVID-19 and influenza during the early stages by measurement of peripheral blood parameters. J. Med. Virol. 2021;93:1029–1037. doi: 10.1002/jmv.26384. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Hwang EJ, et al. Deep learning for chest radiograph diagnosis in the emergency department. Radiology. 2019;293:573–580. doi: 10.1148/radiol.2019191225. [DOI] [PubMed] [Google Scholar]
22.Wang L, Lin ZQ, Wong A. COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci. Rep. 2020;10:19549. doi: 10.1038/s41598-020-76550-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Wehbe RM, et al. DeepCOVID-XR: An artificial intelligence algorithm to detect COVID-19 on chest radiographs trained and tested on a large U.S. clinical data set. Radiology. 2021;299:E167–E176. doi: 10.1148/radiol.2020203511. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Zhang R, et al. Diagnosis of coronavirus disease 2019 pneumonia by using chest radiography: Value of artificial intelligence. Radiology. 2021;298:E88–E97. doi: 10.1148/radiol.2020202944. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Baikpour M, et al. Role of a chest x-ray severity score in a multivariable predictive model for mortality in patients with COVID-19: A single-center, retrospective study. J. Clin. Med. 2022;11:2157. doi: 10.3390/jcm11082157. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Nishio M, et al. Deep learning model for the automatic classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy: A multi-center retrospective study. Sci. Rep. 2022;12:8214. doi: 10.1038/s41598-022-11990-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Sun Y, et al. Use of machine learning to assess the prognostic utility of radiomic features for in-hospital COVID-19 mortality. Sci. Rep. 2023;13:7318. doi: 10.1038/s41598-023-34559-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Nishio M, Noguchi S, Matsuo H, Murakami T. Automatic classification between COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy on chest X-ray image: Combination of data augmentation methods. Sci. Rep. 2020;10:17532. doi: 10.1038/s41598-020-74539-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Garrafa E, et al. Early prediction of in-hospital death of COVID-19 patients: A machine-learning model based on age, blood analyses, and chest x-ray score. Elife. 2021;10:e70640. doi: 10.7554/eLife.70640. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information 1.^{(20.1KB, docx)}

Supplementary Table 1.^{(15.8KB, docx)}

Data Availability Statement

[CR1] 1.DeGrave AJ, Janizek JD, Lee S-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 2021;3:610–619. doi: 10.1038/s42256-021-00338-7. [DOI] [Google Scholar]

[CR2] 2.Cleverley J, Piper J, Jones MM. The role of chest radiography in confirming covid-19 pneumonia. BMJ. 2020 doi: 10.1136/bmj.m2426. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Avila E, Kahmann A, Alho C, Dorn M. Hemogram data as a tool for decision-making in COVID-19 management: Applications to resource scarcity scenarios. PeerJ. 2020;8:e9482. doi: 10.7717/peerj.9482. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Razavian N, et al. A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients. npj Digit. Med. 2020;3:130. doi: 10.1038/s41746-020-00343-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Trevethan R. Sensitivity, specificity, and predictive values: Foundations, pliabilities, and pitfalls in research and practice. Front. Public Heal. 2017;5:307. doi: 10.3389/fpubh.2017.00307. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation 234–241 (2015). 10.1007/978-3-319-24574-4_28.

[CR7] 7.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016). 10.1109/CVPR.2016.90.

[CR8] 8.Arik SÖ, Pfister T. TabNet: Attentive interpretable tabular learning. Proc. AAAI Conf. Artif. Intell. 2021;35(8):6679–6687. [Google Scholar]

[CR9] 9.Wang, W., Tran, D. & Feiszli, M. What makes training multi-modal classification networks hard?. In Proceedings / IEEE Computer Society Conference on Computer Vision and Pattern Recognition 12692–12702 (2019) 10.1109/CVPR42600.2020.01271.

[CR10] 10.Howard J, Gugger S. Fastai: A layered API for deep learning. Information. 2020;11:108. doi: 10.3390/info11020108. [DOI] [Google Scholar]

[CR11] 11.Garcia Santa Cruz B, Bossa MN, Sölter J, Husch AD. Public Covid-19 X-ray datasets and their impact on model bias—A systematic review of a significant problem. Med. Image Anal. 2021;74:102225. doi: 10.1016/j.media.2021.102225. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Lundberg SM, et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020;2:56–67. doi: 10.1038/s42256-019-0138-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Ngiam, J. et al. Multimodal deep learning. ICML (2011).

[CR14] 14.Tan Y, Zhou J, Zhou Q, Hu L, Long Y. Role of eosinophils in the diagnosis and prognostic evaluation of COVID-19. J. Med. Virol. 2021;93:1105–1110. doi: 10.1002/jmv.26506. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Rahman A, et al. Hematological abnormalities in COVID-19: A narrative review. Am. J. Trop. Med. Hyg. 2021;104:1188–1201. doi: 10.4269/ajtmh.20-1536. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Lippi G, Mattiuzzi C. Hemoglobin value may be decreased in patients with severe coronavirus disease 2019. Hematol. Transfus. Cell Ther. 2020;42:116–117. doi: 10.1016/j.htct.2020.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Kukar M, et al. COVID-19 diagnosis by routine blood tests using machine learning. Sci. Rep. 2021;11:10738. doi: 10.1038/s41598-021-90265-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Bayat V, et al. A Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) prediction model from standard laboratory tests. Clin. Infect. Dis. 2021;73:e2901–e2907. doi: 10.1093/cid/ciaa1175. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Soltan AAS, et al. Rapid triage for COVID-19 using routine clinical data for patients attending hospital: Development and prospective validation of an artificial intelligence screening test. Lancet Digit. Heal. 2021;3:e78–e87. doi: 10.1016/S2589-7500(20)30274-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Chen J, et al. Distinguishing between COVID-19 and influenza during the early stages by measurement of peripheral blood parameters. J. Med. Virol. 2021;93:1029–1037. doi: 10.1002/jmv.26384. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Hwang EJ, et al. Deep learning for chest radiograph diagnosis in the emergency department. Radiology. 2019;293:573–580. doi: 10.1148/radiol.2019191225. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Wang L, Lin ZQ, Wong A. COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci. Rep. 2020;10:19549. doi: 10.1038/s41598-020-76550-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Wehbe RM, et al. DeepCOVID-XR: An artificial intelligence algorithm to detect COVID-19 on chest radiographs trained and tested on a large U.S. clinical data set. Radiology. 2021;299:E167–E176. doi: 10.1148/radiol.2020203511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Zhang R, et al. Diagnosis of coronavirus disease 2019 pneumonia by using chest radiography: Value of artificial intelligence. Radiology. 2021;298:E88–E97. doi: 10.1148/radiol.2020202944. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Baikpour M, et al. Role of a chest x-ray severity score in a multivariable predictive model for mortality in patients with COVID-19: A single-center, retrospective study. J. Clin. Med. 2022;11:2157. doi: 10.3390/jcm11082157. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Nishio M, et al. Deep learning model for the automatic classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy: A multi-center retrospective study. Sci. Rep. 2022;12:8214. doi: 10.1038/s41598-022-11990-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Sun Y, et al. Use of machine learning to assess the prognostic utility of radiomic features for in-hospital COVID-19 mortality. Sci. Rep. 2023;13:7318. doi: 10.1038/s41598-023-34559-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Nishio M, Noguchi S, Matsuo H, Murakami T. Automatic classification between COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy on chest X-ray image: Combination of data augmentation methods. Sci. Rep. 2020;10:17532. doi: 10.1038/s41598-020-74539-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Garrafa E, et al. Early prediction of in-hospital death of COVID-19 patients: A machine-learning model based on age, blood analyses, and chest x-ray score. Elife. 2021;10:e70640. doi: 10.7554/eLife.70640. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

MultiCOVID: a multi modal deep learning approach for COVID-19 diagnosis

Max Hardy-Werbin

José Maria Maiques

Marcos Busto

Isabel Cirera

Alfons Aguirre

Nieves Garcia-Gisbert

Flavio Zuccarino

Santiago Carbullanca

Luis Alexander Del Carpio

Didac Ramal

Ángel Gayete

Jordi Martínez-Roldan

Albert Marquez-Colome

Beatriz Bellosillo

Joan Gibert

Abstract

Introduction

Materials and methods

Dataset

Figure 1.

Acquisition of blood sample and image data

Deep learning models

Comparison with thoracic radiologist interpretations

Statistical analysis

Ethical approval

Results

Patient characteristics

Table 1.

Whole CXR models learn spurious characteristics for classification

Figure 2.

Performance of single and multimodal models

Figure 3.

Figure 4.

Comparison with expert thoracic radiologists

Figure 5.

Discussion

Conclusions

Supplementary Information

Abbreviations

Author contributions

Data availability

Competing interests

Footnotes

Supplementary Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases