Highlights
-
•
The study analyzed laboratory features associated with severe/critical symptoms.
-
•
A model for predicting the COVID-19 patients who will progress into severe/critical cases was developed and validated.
-
•
The study is anticipated to predict severe/critical symptom and thus save medical resources.
Keywords: COVID-19, critical/severe symptom, SVM, Prediction
Abstract
Background
Despite the death rate of COVID-19 is less than 3%, the fatality rate of severe/critical cases is high, according to World Health Organization (WHO). Thus, screening the severe/critical cases before symptom occurs effectively saves medical resources.
Methods and materials
In this study, all 336 cases of patients infected COVID-19 in Shanghai to March 12th, were retrospectively enrolled, and divided in to training and test datasets. In addition, 220 clinical and laboratory observations/records were also collected. Clinical indicators were associated with severe/critical symptoms were identified and a model for severe/critical symptom prediction was developed.
Results
Totally, 36 clinical indicators significantly associated with severe/critical symptom were identified. The clinical indicators are mainly thyroxine, immune related cells and products. Support Vector Machine (SVM) and optimized combination of age, GSH, CD3 ratio and total protein has a good performance in discriminating the mild and severe/critical cases. The area under receiving operating curve (AUROC) reached 0.9996 and 0.9757 in the training and testing dataset, respectively. When the using cut-off value as 0.0667, the recall rate was 93.33 % and 100 % in the training and testing datasets, separately. Cox multivariate regression and survival analyses revealed that the model significantly discriminated the severe/critical cases and used the information of the selected clinical indicators.
Conclusion
The model was robust and effective in predicting the severe/critical COVID cases.
1. Introduction/Background
The prevalent of COVID-19 (SARS-CoV-2) has caused 81,021 infection and 3194 deaths in China, according statistics in March 14th, 2020. In other countries, 64,299 cases and 2234 deaths were reported. To current knowledge, the Coronavirus shared 79 % sequences with SARS-CoV, which was prevalent in 2002–2003, especially in China, and shared 96 % sequences with bat coronavirus [1]. The receptor of COVID-19 was ACE2 for cell entry [2].
Clinical observations suggest that the incubation time for COVID-19 was 3–5 days, ranged from 0 to 24 days or more, similar to SARS [3]. According to a study in Wuhan, the mean incubation period was 5.2 days, (95 % CI, 4.1–7.0 days), and the epidemic doubles in every 7.4 days [4]. The R0 was estimated to be 2.24–3.58 [5]. In previous study, the most common early clinical symptoms were fever (98 %), cough (76 %), dyspnea (55 %) and myalgia or fatigue (44 %). In addition, sputum production (28 %) and headache (8%) were also reported [4]. In consistent with this study, fever (91.7 %), cough (75.0 %), fatigue (75.0 %), and gastrointestinal symptoms (39.6 %) were the most common clinical manifestations [6]. Laboratory features including leukopenia (25 %), lymphopenia (25 %) and raised aspartate aminotransferase (37 %, including seven of 28 non-ICU patients) was also included. In addition, AST, ALT, γ-GT, LDH and α-HBDH abnormality was reported [5]. Histopathologic changes and CT features observed [6,7].
Clinically, criteria for severe was identified as respiratory distress, more than 30 times/min, SpO3<93 % at rest, and PaO2/FiO2 < = 300 mmHg. Critical was respiratory failure, shock and extra pulmonary organ failure [8]. However, the mild cases may develop into severe or critical. Despite of the effort devoted for CT-based early critical case diagnosis [9], the performance is still blur. While prediction model for mild case developing into severe or critical is still not reported yet. In this study, it is aimed to identify the initial clinical observations or laboratory features at significantly associated with severe/critical cases, and predict if the disease would develop into severe/critical cases. Machine learning is emphasized for investigating COVID-19 [10].
2. Materials and methods
2.1. Sample enrollment and Clinical feature collection
This study is approved by Ethnic Committee of Shanghai Public Health Clinical Center, and all patients have informed and consent. The patients diagnosed with PCR in Shanghai during 2019-12-22 to 2020-3-12 was all enrolled in this study. As the only appointed hospital of COVID-19 curation, the patients were transferred to Shanghai Public Heath Clinical Center days after the initial diagnosis, and the clinical and laboratory features were generated from Shanghai Public Heath Clinical Center, and the sample was treated as initial ones. Temperature, heart rate, blood pressor was collected when the patients reached the hospital.
Demographic information, laboratory features and clinical indicators were collected from the electronic record system of Shanghai Public Health Clinical Center and re-arranged manually by expert doctors. The accession of the system has been approved by the director of the hospital. The History of hypertension diseases, diabetes, coronary diseases and tuberculosis was collected individually. Severe/critical symptom was defined had one of the following criteria: (a) respiratory frequency ≥30/min; (b) rest pulse oximeter oxygen saturation ≤93 % or (c) oxygenation index (PaO2/FiO2) ≤ 300 mm Hg.
2.2. Laboratory assays
Pharyngeal swab specimens were collected from each patient was used for the COVID-19 viral nucleic acid detection with PCR assay, as previously described [6]. All laboratory data was generated from Shanghai Public Health Clinical Center according to the guidelines. The laboratory features include: Systolic pressure, Urine protein, Urinary red blood
2.3. Statistical analyses
For sample demographic analysis, fisher’s exact test was used. For feature selection, both student t-test and Wilcox rank test were assayed for each clinical/laboratory feature, and features significantly (p < 0.01) different in both algorithms were retained. Survival analysis was implemented using critical/severe symptom as event, and the time to critical/severe event for survival analyses using R package “survival”, and p < 0.01 considered significant. All analysis was performed on R platform (v3.6)
3. Results
3.1. Demographics and clinical characteristics
A total of 336 patients diagnosed as COVID-19 with PCR Kit were enrolled in this study, with 310 non-severe/critical cases and 26 severe/critical cases (Table 1 ). Ten out of the 26 severe/critical cases were onset of critical/severe symptom since they reached the hospital. Among these cases, 74 were from Wuhan, Hubei Province, 4 from Iran, and the other were from the other regions of China. The median age of all cases were 50 years old, the median age of non-severe patients was 48, while severe or critical patients were median 65. Among these patients, 79 have hypertension diseases, 29 have diabetes, 17 have coronary diseases, and 4 have history of tuberculosis.
Table 1.
Characteristics of samples enrolled. Note that not all information was collected.
| All | Non-S/C | S/C | p value | ||
|---|---|---|---|---|---|
| Age | 50 | 48 | 65 | 3.10E-06 | |
| (36−49) | (35−62) | (63−75) | |||
| Gender | Female | 158 | 152 | 6 | 0.013 |
| Male | 177 | 157 | 20 | ||
| Hypertension | No | 256 | 241 | 15 | 0.028 |
| Yes | 79 | 68 | 11 | ||
| Diabetes | No | 301 | 281 | 20 | 0.056 |
| Yes | 29 | 24 | 5 | ||
| Coronary disease | No | 319 | 298 | 21 | 0.0061 |
| Yes | 17 | 12 | 5 | ||
3.2. Clinical and laboratory features associated with severe/critical cases
The clinical and laboratory results of patients enrolled were analyzed. Totally, 249 laboratory and clinical records were obtained, including but not limited to Liver function test, Blood test and Immunocytochemistry were obtained of the initial assay within 24 h since the hospital received the patients. The data were re-arranged and cleaned, and data including few records were excluded, for example, the records of HBV loading was less than 6. Finally, 220 features were included. The clinical features were compared, by dividing into the samples into non-severe/critical and severe/critical groups. Student t-test and Wilcox rank test were used, and for features with p values <0.01 in both algorithms was considered to be significantly associated with severe/critical symptom. Totally, 36 clinical and laboratory features were significantly associated with severe/critical cases, Fig. 1 . These were mainly immune features (including CD3, CD4, CD19, CRP, super-sensitive CRP, leukomonocytes and neutrophils), thyroxine products (including triiodothyronine, free triiodothyronine, thyroxine and free thyroxine), and electrolyte balance (Na+, Cl−). Considering that severe/critical symptom was detected when 10 out of these 26 patients reached the hospital, these features may reflect the character of severe/critical cases, instead of the sign. In other word, these features may be used for diagnosis instead of prediction. Thus, the severe/critical samples were further divided into two groups, one group did not show severe/critical symptom when collecting samples while the other did. Statistical difference of these 33 features were re-analyzed. Interestingly, none of these features were statistically different between the groups (Table S1). This may imply that the various immune cells have participate in the severe/critical disease, and laboratory features have been exhibited before the severe/critical symptom onset.
Fig. 1.
The clinical indicators of severe/critical and non-severe/critical cases. A. The clinical feature values were z-score transformed. Red indicates high values, white indicate missing values and green indicate low values. The blue columns represent the mild cases while red columns refer to severe/critical samples. B. Vioplots of indicators, the two groups on x-axis in each panel were mild and severe cases, respectively, and the y-axis represents the values of the indicator (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.).
3.3. Support Vector Machine (SVM) for decimating the severe/critical disease
Since the performance of single clinical indicators were not satisfactory, the combination of features was considered for further prediction. Considering the over-fitting effect rapidly increase with number of features, a combination of less than five features were used. Exhaustive Attack method (numeration method, which means list all combinations of the fatures) by combining 2, 3, and 4 features was used. The samples were divided into training and testing datasets. The training datasets contains 15 severe/critical cases and 178 mild cases, while the testing dataset is consist of 11 severe/critical cases and 132 mild cases. In this step, Support Vector Machine (SVM) was used to develop model in training set by the selected features, and predict the outcome in testing set. The Area Under Receiving Operating Curve (AUROC) was used for evaluate the performance for the model in both training and testing dataset. As expected, the performance of the models was better than single features. Among the AUROC of the combinations of age, GSH, CD3 percentage and total protein (the AUROC for each feature was 0.79.3, 0.7970, 0.8147 and 0.7443, respectively) reached 0.9997 (Fig. 2 a) in the training dataset (Table 2 ). Thus, the combination was used, and the performance of model was also satisfactory in testing dataset, and AUROC was 0.9757 (Fig. 2b).
Fig. 2.
Receiving Operating Characteristic (ROC) curves to evaluate the performance of the SVM model in training (A) and testing (B) datasets. The black dots is the optimized cut-off value (0.0667).
Table 2.
The combinations performed best in the training set using SVM models.
| Combinations | Training AUC | Testing AUC |
|---|---|---|
| Age, GSH, CD3 ratio, total protein | 0.999616858 | 0.975711 |
| Neutrophil percentage, albumin, GSH, CD4 ratio | 0.997318008 | 0.975711 |
| HCRP, Serum myoglobin, CL, CD4 ratio | 0.998357964 | 0.969466 |
| Age, Cl, Calcium, LDH | 0.997318008 | 0.951748 |
| Age, Serum myoglobin, Retinol binding protein, Acid glycoprotein | 0.990960452 | 0.951423 |
| Neutrophil percentage, Procalcitonin, Serum myoglobin, total protein | 0.977024482 | 0.958362 |
The detailed prediction results were shown in Table 2. Using the optimized cut-off value, 0.0667, the only one sample was false negative, all the other samples were correctly predicted. When applying the same model onto the testing dataset, recall rate was 100 %, and there were 15 false positives, and no false negatives. In summary, the four-features based SVM model is robust and effective in predicting the severe/critical patients.
3.4. Performance of SVM model
The performance of the SVM model was further analyzed by comparing the survival analysis. Since only three death cases were enrolled in this study, the “event” was selected as the time clinical severe/critical symptom observed. Using the aforementioned cut-off value, 0.0667, the samples were divided in to two groups, named Low-risk and High-risk groups. Since the sample number with severe/critical symptom is limited, the training set and validation set was combined for further analyses. As expected, the High-risk group has a higher severe rate than the Low-risk group (Fig. 3 a, p<1e-16). Since a proportion of cases were detected severe/critical symptom, which may bring bias in analysis. Thus, these samples were excluded for “survival” analysis. In consistent with previous results, the severe/critical symptom rate of High-risk groups was also significantly higher than the Low-risk groups (Fig. 3b, p<1e-16).
Fig. 3.
Performance of the model. “Survival” analysis of the High-risk and Low-risk groups in all samples (A) and samples without severe/critical cases when sampling (B). The predicted values in different groups (C).
In addition to survival analyses, the prediction risk value was compared between severe/critical cases. As expected, the risk value of severe/critical cases is significantly higher than that of mild cases (Fig. 3c). Cox multivariate regression was analyzed, and the results showed that the features used in the model, GSH, total protein and CD3 percentage were not statistically significant, except for age, while the risk value is (Table 3 ). It is notable that despite that age is statistically significant, but the hazard ration is much low than the risk model (33 vs 1.04) indicating that model is more informative than these features (Table 4 ).
Table 3.
True positive, true negative, false positive and false negative values of the model in training and testing datasets.
| Training | Predicted Positive | Predicted Negative |
|---|---|---|
| Real Positive | 14 | 1 |
| Real Negative | 0 | 174 |
| Testing | Predicted Positive | Predicted Negative |
|---|---|---|
| Real Positive | 11 | 0 |
| Real Negative | 15 | 116 |
Table 4.
Cox multivariate regression using features and predicted values.
| Variables | HR | L95 %CI | H95 %CI | p-value |
|---|---|---|---|---|
| Age | 1.0425 | 1.0025 | 1.084 | 0.0368 |
| GSH | 0.9966 | 0.9744 | 1.019 | 0.7703 |
| CD3 Percent | 0.9817 | 0.9427 | 1.022 | 0.3715 |
| Total protein | 0.9307 | 0.8553 | 1.013 | 0.0958 |
| Predict value | 32.9883 | 8.6023 | 126.505 | 3.43E-07 |
4. Discussion
The prevalence of COVID-19 posed a huge burden to medical resources since its high severe/critical rate. In this study, clinical and laboratory features were analyzed and 36 of them were found to be statistically significantly associated with the clinical outcome (severe/critical symptom) of these patients infected COVID-19. It is interesting that despite some patients (10 out of 26) were observed severe/critical symptom while the others (16 out of 26) were mild when underwent clinical and laboratory examinations, the features of all these cases were similar. It is also noticed that the features include dysfunction of immune cells and immune products, including CD3, CD4, CD19, CRP, high-sensitive CRP, leukomonocytes and neutrophils. In consistent with this, previous study claimed that severe cases have significantly more leukocytes count and CRP [6]. In combination of these clues, we suspect that the acute immune response has been start several days before severe/critical symptom begins.
The lack of prediction model makes the early detection difficult. Despite that models for COVID-19 diagnosis and prognosis was developed, and at least 27 studies and 31 prediction model was developed [11]. Among these models, 10 were for survival risk while only two models were aimed to predict progression to a severe or critical state. A new study revealed that one demographic and six serological indicators (age, serum lactate dehydrogenase, C-reactive protein, the coefficient of variation of red blood cell distribution width (RDW), blood urea nitrogen, albumin, direct bilirubin were associated with severe symptoms, which is consistent with our study [12]. The model developed has sensitivity of 77.5 % and specificity of 78.4 % in the validation cohort. Since the laboratory indicators of this study is limited, the sensitivity and specificity are not satisfactory. Another study collected data from 133 patients with mild symptom in Wuhan, and used multivariate logistic regression for predicting the patients who will developed into severe symptom using AI, and the best AUC achieved was 0.954. However, the sample number is the major concern [13]. Compared with the models, our model used over 220 clinical indicators, and the model developed achieved a better performance and this model was further was validated.
It is also noticed triiodothyronine (T3), free triiodothyronine, thyroxine (T4) and free thyroxine was significantly lower in severe/critical patients. The AUROC of triiodothyronine reached 0.96. Despite that correlation between thyroxine and severe/critical symptom was not reported in COVID-19 or MERS, relationship between critical symptom and thyroxine has been reported, and could be used for prognosis. Also, some SARS infected patients have decreased T3 and T4 [14], which may be caused by necrosis of thyroid [15].
The utilization of the model: develop an SVM model using the existing data, consisting of clinical outcome (severe/critical symptom) and features (age, GSH, total protein and CD3 percentage), input the corresponding data of each individual, and the likelihood of the patient develop into S/C symptom will be generated. If the value is high than the cutoff (0.0067). The patient is predicted to progress into SC, and vice versa. The limitation of this study is the relatively small sample size (N = 336). Due to the relative advanced treatment technology in Shanghai region, the critical/severe symptom rate is lower, which result in the limited number of severe/critical cases. In addition, among the patients with severe/mild symptom, some had observed critical/severe symptom when the samples were collected. In the future work, we will collect and analyze more samples from the other regions to further validate our model.
In summary, we analyzed more than 200 clinical and laboratory features and proposed an SVM based model to predict the opportunity of patients progress into severe/critical symptoms. The model was developed in training dataset and validated in the testing dataset, the AUROC was 0.9996 and 0.9757, respectively, suggesting the robustness of model.
Author contributions
SL, LG, SX designed the project; SF, SL, SN, LF and LG collected the data, LG, SL and LP analyzed the data; LG, SL and SX interpreted the results.
Declaration of Competing Interest
The authors declare no (potential) conflict of interest
Acknowledgement
National Key R&D Program of China (2018YFB1307700) to Liping Sun.
Footnotes
Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.jcv.2020.104431.
Contributor Information
Liping Sun, Email: sunlp@sumhs.edu.cn.
Lining Sun, Email: lnsun@hit.edu.cn.
Yuxin Shi, Email: shiyx828288@163.com.
Appendix A. Supplementary data
The following is Supplementary data to this article:
References
- 1.Li W., Shi Z., Yu M. Bats are natural reservoirs of SARS-like coronaviruses. Science. 2005;310(5748):676–679. doi: 10.1126/science.1118391. PubMed PMID: 16195424; eng. [DOI] [PubMed] [Google Scholar]
- 2.Zhou P., Yang X.-L., Wang X.-G. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579(7798):270–273. doi: 10.1038/s41586-020-2012-7. PubMed PMID: 32015507; eng. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bauch C.T., Lloyd-Smith J.O., Coffee M.P. Dynamically modeling SARS and other newly emerging respiratory illnesses: past, present, and future. Epidemiology. 2005;16(6):791–801. doi: 10.1097/01.ede.0000181633.80269.4c. PubMed PMID: 16222170; eng. [DOI] [PubMed] [Google Scholar]
- 4.Huang C., Wang Y., Li X. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395(10223):497–506. doi: 10.1016/S0140-6736(20)30183-5. PubMed PMID: 31986264; eng. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhao D., Yao F., Wang L. A comparative study on the clinical features of COVID-19 pneumonia to other pneumonias. Clin. Infect. Dis. 2020;(March) doi: 10.1093/cid/ciaa247. PubMed PMID: 32161968; eng. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhang H., Zhou P., Wei Y. Histopathologic changes and SARS-CoV-2 immunostaining in the lung of a patient with COVID-19. Ann. Intern. Med. 2020;(March) doi: 10.7326/m20-0533. PubMed PMID: 32163542; eng. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Xia W., Shao J., Guo Y. Clinical and CT features in pediatric patients with COVID-19 infection: different points from adults. Pediatr. Pulmonol. 2020;(March) doi: 10.1002/ppul.24718. PubMed PMID: 32134205; eng. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zu Z.Y., Jiang M.D., Xu P.P. Coronavirus disease 2019 (COVID-19): a perspective from China. Radiology. 2020 doi: 10.1148/radiol.2020200490. PubMed PMID: 32083985; eng. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Poggiali E., Dacrema A., Bastoni D. Can lung US help critical care clinicians in the early diagnosis of novel coronavirus (COVID-19) pneumonia? Radiology. 2020;(March) doi: 10.1148/radiol.2020200847. PubMed PMID: 32167853; eng. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Tarnok A. Machine Learning, COVID-19 (2019-nCoV), and multi-OMICS. Cytometry Part A: J. Int. Soc. Anal. Cytol. 2020;97(March (3)):215–216. doi: 10.1002/cyto.a.23990. PubMed PMID: 32142596; eng. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wynants L., Van Calster B., Bonten M.M.J. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ (Clin. Res. Ed.) 2020;369(April):m1328. doi: 10.1136/bmj.m1328. PubMed PMID: 32265220; eng. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gong J., Ou J., Qiu X. A tool to early predict severe 2019-novel coronavirus pneumonia (COVID-19): a multicenter study using the risk nomogram in Wuhan and Guangdong, China. JmedRxiv. 2020 doi: 10.1101/2020.03.17.20037515. 2020.03.17.20037515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bai X., Fang C., Zhou Y. Predicting COVID-19 malignant progression with AI techniques. JmedRxiv. 2020 doi: 10.1101/2020.03.20.20037325. 2020.03.20.20037325. [DOI] [Google Scholar]
- 14.Leow M.K., Kwek D.S., Ng A.W. Hypocortisolism in survivors of severe acute respiratory syndrome (SARS) Clin. Endocrinol. 2005;63(August (2)):197–202. doi: 10.1111/j.1365-2265.2005.02325.x. PubMed PMID: 16060914; eng. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wei L., Sun S., Xu C.H. Pathology of the thyroid in severe acute respiratory syndrome. Hum. Pathol. 2007;38(January (1)):95–102. doi: 10.1016/j.humpath.2006.06.011. PubMed PMID: 16996569; eng. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



