Dear Editor,
At October 19 2020, 40 million patients had been infected with COVID-19 worldwide, and about 1.1 million had died from the disease.1 This virus belongs to the same coronavirus family as the MERS virus that circulated in 2015, but it is much more infectious, and the world is currently experiencing a pandemic.2 However, the factors affecting disease severity and mortality have not yet been clearly identified. The machine learning (ML) algorithm is a model suitable for the medical field because it has a fairly accurate prediction capability for large-scale new, never-seen-before inputs such as COVID-19 pandemic.3 In this paper, we have analyzed the factors affecting the severity and mortality of 8070 COVID-19 patients registered in the National Health Insurance Service (NHIS) of South Korea using ML algorithms.(NHIS-2020-1-479)
The severity of COVID-19 was defined as the end result with one of following conditions. (1) Intensive care unit (ICU) care; (2) Extracorporeal membrane oxygenation (ECMO) treatment; (3) Mechanical ventilator care; (4) Oxygen supply. The mortality of COVID-19 was also checked because the NHIS data was connected to the Korea Disease Control and Prevention Agency and Statistics Korea, which has the mortality data.
A total of 21 diseases (Hypertension (HTN), Diabetes mellitus (DM), Influenza, Cancer, Pulmonary disease, Angiotensin Converting Enzyme or Angiotensin Receptor Blocker (ARB) among hypertensive patients, Gastroesophageal reflux disease (GERD), Acute sinusitis (A_sinusitis), Chronic sinusitis (C_sinusitis), Osteoporosis, Cardiovascular disease (CVD), Angina, Peripheral vascular disease (PVD), Congestive heart failure (CHF), Depression, Rheumatologic disease (RA), Hepatitis, Myocardial infarction (MI), Inflammatory bowel disease (IBD), Non-tuberculosis mycobacterium (NTM), olfactory loss (Anosmia)) were chosen as the underlying diseases in the 8070 COVID-19 patients. NHIS-customized data for the past 5 years were selected for the patients confirmed with COVID-19, and hospital use records for the past 5 years were used to identify the following inclusion criteria.
A total of 8070 COVID-19 confirmed patients were included in this study. (Fig. 1 A) Their average age was 39.9 years (SD: 19.7 years), 3236 (40.1%) males and 4834 (59.9%) females. Of the 785 patients classified as severe, 374 were men and 411 were women (p<0.001). The mean age of severely ill patients was 61.6 years (SD 16.0 years). There were a total of 248 patients who died. Among the patients who died, 136 were male and 112 were female (p = 0.0008). The average age of the patients who died was 72.1 years (10.2 years) (Fig. 1B).
Fig. 1.
A. Flowchart of the entire study design. B. Age distribution of 8070 COVID-19 confirmed patients in Korea in this study. C. Comorbidity of 8070 COVID-19 confirmed patients in Korea in this study. D. ROC curves and AUC values in the prediction of severity of COVID-19. E. BD in the prediction of severity of COVID-19. F. Variable importance of the neural network model in the prediction of severity of COVID-19. G. ROC curves and AUC values in the prediction of mortality of COVID-19. H. BD in the prediction of mortality of COVID-19. I. Coefficient Heatmap of the three logistic model in the prediction of mortality of COVID-19.
Regarding the underlying diseases in COVID-19 patients, 4572 patients had a history of pulmonary disease, 674 patients with influenza, 231 patients with ARB, and 77 patients with anosmia (Fig. 1C).
Model selection was made by comparing area under the ROC curve (AUC) values for each model. Among the various models, the model with the best prediction of severity was the neural network with an AUC value of 85.06%, followed by logistic regression elastic net (EN) (84.74%) (Fig. 1D). The most important variable for predicting severity in the neural network model was a history of influenza (relative importance: 0.083). (Fig. 1F, Table 1 ).
Table 1.
Outcomes | Model | Measure | Variable importance | Value | Outcomes | Model | Measure | Variable importance | Value |
---|---|---|---|---|---|---|---|---|---|
Severity | Lasso | Estimated coefficient | Age | 1.276 | Mortality | Lasso | Estimated coefficient which is not zero | Age | 2.203 |
DM | 0.431 | Metropolitan | −0.783 | ||||||
Male | 0.415 | Influenza | 0.763 | ||||||
Anosmia | −0.379 | Anosmia | −0.684 | ||||||
HTN | 0.266 | Male | 0.682 | ||||||
ARB | 0.222 | NTM | 0.598 | ||||||
Influenza | 0.211 | DM | 0.393 | ||||||
CVD | 0.209 | HTN | 0.322 | ||||||
Pulmonary | 0.135 | Pulmonary | 0.257 | ||||||
A_Sinusitis | 0.092 | PVD | 0.243 | ||||||
Elastic | Estimated coefficient | Age | 1.203 | Elastic | Estimated coefficient which is not zero | Age | 2.136 | ||
DM | 0.442 | Anosmia | −1.438 | ||||||
Anosmia | −0.413 | Metropolitan | −0.985 | ||||||
Male | 0.397 | NTM | 0.980 | ||||||
HTN | 0.309 | Influenza | 0.830 | ||||||
CVD | 0.235 | Male | 0.710 | ||||||
ARB | 0.234 | DM | 0.405 | ||||||
Influenza | 0.222 | HTN | 0.365 | ||||||
Pulmonary | 0.147 | Pulmonary | 0.295 | ||||||
A_Sinuistis | 0.091 | Depression | 0.280 | ||||||
Ridge | Estimated coefficient | Age | 1.006 | Ridge | Estimated coefficient which is not zero | Age | 1.389 | ||
Anosmia | −0.838 | Anosmia | −1.388 | ||||||
DM | 0.480 | NTM | 1.002 | ||||||
HTN | 0.419 | Metropolitan | −0.761 | ||||||
Male | 0.400 | Influenza | 0.722 | ||||||
Influenza | 0.397 | HTN | 0.642 | ||||||
CVD | 0.326 | Male | 0.582 | ||||||
NTM | 0.310 | DM | 0.442 | ||||||
ARB | 0.301 | Pulmonary | 0.352 | ||||||
MI | −0.214 | Depression | 0.346 | ||||||
Random Forest | Mean decrease in Gini impurity | Age | 174.074 | Random Forest | Mean decrease in Gini impurity | Age | 38.970 | ||
HTN | 51.519 | HTN | 8.036 | ||||||
DM | 36.373 | Male | 6.669 | ||||||
CVD | 20.110 | DM | 6.427 | ||||||
Osteoporosis | 17.828 | CVD | 5.842 | ||||||
Male | 16.432 | PVD | 5.094 | ||||||
Pulmonary | 14.944 | RA | 4.963 | ||||||
Cancer | 14.928 | Osteoporosis | 4.883 | ||||||
ARB | 14.676 | Cancer | 4.588 | ||||||
A_Sinuistis | 14.159 | Pulmonary | 4.581 | ||||||
Bagging | Mean decrease in Gini impurity | Age | 193.724 | Bagging | Mean decrease in Gini impurity | Age | 40.825 | ||
HTN | 60.297 | Male | 9.054 | ||||||
DM | 35.551 | HTN | 8.948 | ||||||
Male | 23.011 | DM | 7.458 | ||||||
Pulmonary | 22.394 | CVD | 7.336 | ||||||
Cancer | 21.379 | Pulmonary | 7.193 | ||||||
Osteoporosis | 21.278 | Cancer | 7.128 | ||||||
CVD | 20.893 | RA | 7.006 | ||||||
A_Sinuistis | 20.869 | Osteoporosis | 6.700 | ||||||
RA | 19.921 | PVD | 6.366 | ||||||
Neural Network | Relative importance | Influenza | 0.083 | Neural Network | Relative importance | CVD | 0.076 | ||
ARB | 0.075 | Age | 0.074 | ||||||
Age | 0.062 | Male | 0.0659 | ||||||
Anosmia | 0.060 | RA | 0.062 | ||||||
C_Sinuistis | 0.059 | C_Sinuistis | 0.053 | ||||||
A_Sinuistis | 0.055 | Influenza | 0.051 | ||||||
Osteoporosis | 0.054 | IBD | 0.048 | ||||||
MI | 0.051 | PVD | 0.045 | ||||||
RA | 0.047 | HTN | 0.045 | ||||||
NTM | 0.047 | Pulmonary | 0.044 |
The model with the best prediction of death was the logistic regression EN model with an AUC value of 93.89%, followed by the logistic regression lasso model (93.84%), the neural network model (93.73%) (Fig. 1G). The most important variables for mortality in the EN model were age (coefficient: 2.136) and anosmia (coefficient: –1.438) (Fig. 1I, Table 1).
We analyzed 24 factors affecting severity and mortality in 8070 patients using a novel ML algorithm that has recently emerged. Foremost, influenza history was a very important variable in terms of COVID-19 severity (neural network 1st, ridge 6th) and mortality (EN 5th, lasso 3rd, ridge 5th). (Fig. 1I, Table 1) It has been reported that oseltamivir cannot prevent worsening of symptoms and disease in patients with COVID-19 as different molecular docking sites have been found in vitro and retrospective studies in COVID-19.7 Among recent papers, it has been reported that influenza vaccination can alleviate the risk of death in a pandemic situation caused by COVID-19.4 Since the symptoms of influenza and COVID-19 are similar, it can be confusing which disease is present, so vaccination can be important in preventing the twindemic of COVID-19 and influenza co-infection. In this paper, we studied the history of influenza and the severity of COVID-19. A history of influenza can sometimes cause pulmonary fibrosis, a common sequelae of virus-induced pneumonia, and this complication is estimated to cause increased severity and mortality of COVID-19 infection. These results are in line with the current policy recommending influenza virus vaccination, mainly considering the current COVID-19 epidemic and the prevalence of influenza during the period from autumn to spring.
Anosmia was also identified as an important variable in predicting the severity of COVID-19. The best predictive models for mortality were the EN and lasso models, and the second most important variable in both these models was anosmia. This means that the mortality rate was low in patients with olfactory loss after the COVID-19 diagnosis. There are papers which indicate that recent olfactory loss in mild to moderate COVID-19 patients is an important factor that differentiates COVID-19 from other infectious disease, and in most cases, the sense of smell recovers well.5 , 6 Another paper reports that anosmia is associated with lower in-hospital mortality in COVID-19, which is in line with our research results.7 The novel finding in our study is that anosmia will continue to be an indicator that should be carefully examined in COVID-19 infection.8
Influenza was found to be a major adverse factor in COVID-19 in addition to the factors of old age and male sex, and which are already known to be related to disease severity and mortality. In addition, anosmia was found to be a major factor associated with lower severity and mortality rates. Therefore, in the current situation where there is no adequate COVID-19 treatment at present, examining the history of influenza vaccination and anosmia in addition to age and sex will be important indicators for predicting the severity and mortality of COVID-19 patients.
Abbreviations: (Receiver Operating Characteristic (ROC), Area Under the Curve (AUC), Binomial Deviances (BD), Hypertension (HTN), Diabetes mellitus (DM), Influenza, Cancer, Pulmonary disease, Angiotensin Converting Enzyme or Angiotensin Receptor Blocker (ARB) among hypertensive patients, Gastroesophageal reflux disease (GERD), Acute sinusitis (A_sinusitis), Chronic sinusitis (C_sinusitis), Osteoporosis, Cardiovascular disease (CVD), Angina, Peripheral vascular disease (PVD), Congestive heart failure (CHF), Depression, Rheumatologic disease (RA), Hepatitis, Myocardial infarction (MI), Inflammatory bowel disease (IBD), Non-tuberculosis mycobacterium (NTM), olfactory loss (Anosmia))
Author Contributions
Doo Hwan Kim: Contributed to the study design, protocol and study materials, collected study data, provided data access, and helped write the first draft of the manuscript (Methods and Results sections).
Min Gul Kim: Contributed to the study design, protocol, study materials and data analysis, and helped write the first draft of the manuscript (Methods and Results sections).
Seong J. Yang: Designed the statistical plan, assisted with data analysis and interpretation of the data, and helped write the first draft of the manuscript (Methods section).
Eun Jung Lee: Contributed to the study design, protocol, and study materials, and helped write the first draft of the manuscript (Results section).
Sang Woo Yeom: Collected the study data, performed the statistical analysis, and helped write the first draft of the manuscript (Methods section).
Yeon Seok You: Contributed to the study design, protocol and study materials, collected study data.
Jong Seung Kim: Contributed to the study design, protocol and study materials, designed the statistical plan and data analysis, performed the statistical analysis, wrote the first draft of the manuscript
Supplementary material
supplementary.docx
Funding
None
Declaration of Competing Interest
None
Acknowledgments
This paper was supported by a fund of the Biomedical Research Institute at Jeonbuk National University Hospital. We specially thanks to Professor Sam Hyun Kwon for the idea of this manuscript.
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.jinf.2021.08.024.
Appendix. Supplementary materials
References
- 1.https://www.who.int/emergencies/diseases/novel-coronavirus-2019
- 2.Goh G.K., Dunker A.K., Foster J.A., Uversky V.N. Rigidity of the outer shell predicted by a protein intrinsic disorder model sheds light on the COVID-19 (Wuhan-2019-nCoV) infectivity. Biomolecules. 2020;10 doi: 10.3390/biom10020331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Rajkomar A., Dean J., Kohane I. Machine Learning in Medicine. N Engl J Med. 2019;380:1347–1358. doi: 10.1056/NEJMra1814259. [DOI] [PubMed] [Google Scholar]
- 4.Grohskopf L.A., Liburd L.C., Redfield R.R. Addressing influenza vaccination disparities during the COVID-19 pandemic. JAMA. 2020;324:1029–1030. doi: 10.1001/jama.2020.15845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lee Y., Min P., Lee S., Kim S.W. Prevalence and duration of acute loss of smell or taste in COVID-19 patients. J Korean Med Sci. 2020;35:e174. doi: 10.3346/jkms.2020.35.e174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Baron-Sanchez J., Santiago C., Goizueta-San Martin G., Arca R., Fernandez R. Smell and taste disorders in Spanish patients with mild COVID-19. Neurologia. 2020 doi: 10.1016/j.nrleng.2020.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Talavera B., García-Azorín D., Martínez-Pías E., Trigo J., Hernández-Pérez I., Valle-Peñacoba G., et al. Anosmia is associated with lower in-hospital mortality in COVID-19. J Neurol Sci. 2020;419 doi: 10.1016/j.jns.2020.117163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Calica Utku A., Budak G., Karabay O., Guclu E., Okan H.D., Vatan A. Main symptoms in patients presenting in the COVID-19 period. Scott Med J. 2020;65:127–132. doi: 10.1177/36933020949253. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.