A deep learning model and machine learning methods for the classification of potential coronavirus treatments on a single human cell

Nour Eldeen M Khalifa; Mohamed Hamed N Taha; Gunasekaran Manogaran; Mohamed Loey

doi:10.1007/s11051-020-05041-z

This article has been retracted.

Retraction in: J Nanopart Res. 2021 Aug 16;23(8):184 See also: PMC Retraction Policy

. 2020 Oct 17;22(11):313. doi: 10.1007/s11051-020-05041-z

A deep learning model and machine learning methods for the classification of potential coronavirus treatments on a single human cell

Nour Eldeen M Khalifa ¹, Mohamed Hamed N Taha ¹, Gunasekaran Manogaran ^2,³, Mohamed Loey ^4,^✉

PMCID: PMC7568014 PMID: 33100894

Abstract

Coronavirus pandemic is burdening healthcare systems around the world to the full capacity they can accommodate. There is an overwhelming need to find a treatment for this virus as early as possible. Computer algorithms and deep learning can participate positively by finding a potential treatment for SARS-CoV-2. In this paper, a deep learning model and machine learning methods for the classification of potential coronavirus treatments on a single human cell will be presented. The dataset selected in this work is a subset of the publicly online datasets available on RxRx.ai. The objective of this research is to automatically classify a single human cell according to the treatment type and the treatment concentration level. A DCNN model and a methodology are proposed throughout this work. The methodical idea is to convert the numerical features from the original dataset to the image domain and then fed them up into a DCNN model. The proposed DCNN model consists of three convolutional layers, three ReLU layers, three pooling layers, and two fully connected layers. The experimental results show that the proposed DCNN model for treatment classification (32 classes) achieved 98.05% in testing accuracy if it is compared with classical machine learning such as support vector machine, decision tree, and ensemble. In treatment concentration level prediction, the classical machine learning (ensemble) algorithm achieved 98.5% in testing accuracy while the proposed DCNN model achieved 98.2%. The performance metrics strengthen the obtained results from the conducted experiments for the accuracy of treatment classification and treatment concentration level prediction.

Keywords: COVID-19, Deep transfer learning, Classical machine learning

Introduction

SARS virus spread around the world and caused a lot of panic globally at the end of February 2003 (Chang et al. 2020; Chamola et al. 2020). This led to set an alarm about viruses and their devastating impact in the new century. The 2019 latest coronavirus was described by the World Health Organization (WHO) in the form of 2019-nCov (COVID-19) (Singhal 2020; Loey et al. 2020a). The 2019 coronavirus was identified as the SARS-CoV-2 by the International Committee on Taxonomy of Viruses (ICTV) in 2020 (Lai et al. 2020; Li et al. 2020; Sharfstein et al. 2020). More than 500,000 fatalities in 213 countries and territories were affected by an outbreak of SARS-CoV-2 before the date of the published article (Worldometer 2020). The transmission of coronavirus (person to person) was spreading so fast for example, in Italy (Giovanetti et al. 2020), US (Holshue et al. 2020), India (Khattar et al. 2020), and Germany (Rothe et al. 2020). On 10 July 2020, SARS-CoV-2 confirmed more than 12 million cases, 6 million recovered cases, and 550,000 death cases. Figure 1 shows some statistics about recovered and death cases of COVID-19 (Coronavirus (COVID-19) map 2020).

Fig. 1 — COVID-19 statistics in some countries

Generally, most of the publication focus is on the classification and detection of X-ray and CT images of COVID-19 (Civit-Masot et al. 2020; Waheed et al. 2020; Narayan Das et al. 2020; Ardakani et al. 2020). In this research, our focus is on recognizing and detecting a drug to help in healing from COVID-19 and study a morphological effect of COVID-19. Today, DL is quickly becoming a crucial technology in image/video classification and detection (Loey et al. 2020b, c; Khalifa et al. 2019a). In this paper, a deep learning model and machine learning methods for the classification of potential coronavirus treatments on a single human cell will be presented. The objective of this research is to automatically classify a single human cell according to the treatment type and the treatment concentration level. The novelty of this research is using a proposed classification model based on deep learning and machine learning for COVID-19 virus treatments. The remainder of the document is structured appropriately. “Datasets characteristics” includes a summary of the data set characteristics. “The proposed model” provides a detailed description of the proposed model. Throughout “Experimental results”, preliminary findings are recorded and evaluated, and the assumptions and potential future research are presented in “Conclusion and future works”.

Datasets characteristics

This research conducted its experiments based on the dataset presented in research (Heiser et al. 2020). The dataset attribute description is presented in detail in Table 1. The data are publicly available at RxRx.ai under the name of “RxRx19a Dataset”. It is a high-dimensional dataset that analyzes more than 1660 of FDA-approved drugs in a human cellular model of SARS-CoV-2 infection and included more than 300,000 recorded experiments. Although the presented data is in vitro screen that represents data from only a single human cell type, this dataset is likely broadly applicable to other primary human cell models.

Table 1.

RxRx19a dataset attributes description

Attribute	Description
site_id	Unique identifier of a given site
well_id	Unique identifier of a given well
cell_type	Cell type-tested
Experiment	Experiment identifier
Plate	Plate number within the experiment
Well	Location on the plate
Site	Indication of the location in the well where the image was taken (1, 2, 3, or 4)
disease_condition	The disease condition tested in the well (mock, irradiated, or viral)
Treatment	Compound tested in the well
treatment_conc	Compound concentration tested (in μM)
Feature 1 to 1024	Feature of the cells (1024 attributes of feature cells)

Open in a new tab

In this research, a subset of data is included in the conducted research experiments. The subset includes VERO cells which are a continuous cell lineage derived from kidney epithelial cells of an African green monkey and human renal cortical epithelial (HRCE) cells. Both cells were selected along with 10, 30, and 100 treatment concentration level with active SARS-CoV-2. This subset includes 32 treatments and three treatment concentration levels with two classes of cell type. Only 3750 cell records are included in the experiment carried out in this research.

The proposed model

The introduced model consists of three phases. The first phase is the preprocessing phase that converts the numerical values of the 1024 cell features to a digital image. The second phase is the training phase based on machine learning algorithms for numerical features and deep convolutional neural networks for the converted image features. The third phase is the testing phase and the evaluation of proposed model accuracy for treatment classification and treatment concentration level prediction. Figure 2 presents the proposed model structure.

Fig. 2 — The proposed model structure and phases

Preprocessing phase

The pre-processing phase includes (1) loading the 1024 features of cells on to computer memory, (2) change the cell feature original numerical domain that ranges from − 0.00046466477, 4.508815065 to image range [0, 255] according to equation (1), (3) construct image by converting the data vector of 1024 feature cells into a 32 × 32 pixel image according to the pseudocode presented in Algorithm 1. The result of this phase will be 3750 images. Figure 3 illustrates a set of images after the pre-processing phase.

Fig. 3 — Examples of the converted cell images

Pixel value = Round (\frac{(feature cell value - (- 0.00046466477))}{4.508815065} \times 255)

where − 0.00046466477 is the minimum cell value and 4.508815065 is the maximum cell value in the 1024 features of cell data and 255 is the maximum value of the image domain.

Algorithm 1: Constructing image from 1024 features of the cell data vector

Training phase

The training phase is conducted based on two methodologies. The first methodology uses machine learning algorithms such as support vector machine, decision trees, and ensemble algorithms. The second methodology is depending on deep convolutional neural networks.

Support vector machine

SVM is one of the most common and impressive machine learning techniques for recognition and regression. SVM is a functioning algorithm, as shown in equation (2), where l is the label from 0 to 1, w. a − q is the output, w and q are the linear category coefficients, and a is the input vector. Equation (3) will enforce the loss function that is to be reduced (Çayir et al. 2018; Jogin et al. 2018).

SV M_{h_{k}} = max (0, 1 - l_{k} (w . a_{k} - q)

SV M_{loss} = \frac{1}{m} \sum_{t = 1}^{m} max (0, h_{t})

Decision tree

The decision tree is the computing classification paradigm focused on entropy method and knowledge acquisition. Entropy computes the amount of uncertainty in data as shown in equation (4), where CD is the data, b is the class output, and p(x) is the proportion of q label. Measuring the entropy gap from results, we calculate knowledge acquisition (KA) as illustrated in equation (5), where x is the subset of data (Navada et al. 2011; Tu and Chung 1992).

Entropy (CD) = \sum_{i = 1}^{n} - p (b_{i}) . log (p (b_{i})

KA = Entropy (CD) - \sum_{x \in D} p (x) Entropy (x)

Ensemble methods

Ensemble methods are algorithms for machine study that build several classifiers, which is used to identify new cases in one direction or another through specific decisions (typically through weighted or unweighted votes) (Polikar 2012). The used methods are linear regression (Naseem et al. 2010), logistic regression (Kleinbaum and Klein 2002), and K-nearest neighbors algorithm (k-NN) (Mangalova and Agafonov 2014). We improve our ensemble by equation (6) to achieve the best outcomes (Xiao et al. 2018).

\bar{y} = \sum_{k = 1}^{h} α_{k} y_{k}

Deep convolutional neural networks

The structure of the proposed deep convolutional neural networks is presented in Fig. 4. The proposed DCNN consists of three main convolutional layers with window size 3 × 3 pixels, three ReLU layers, and three pooling layers. The previous layers are used as feature extractions while two fully connected layers are used as classification layers. The proposed model for DCNN is a result of a lot of architecture tuning and tweaking based on work presented in (Khalifa et al. 2018; Khalifa et al. 2019b; Khalifa et al. 2020; Loey et al. 2020d).

One problem that faces DCNN is overfitting. Overfitting can be solved by data augmentation (Shorten and Khoshgoftaar 2019; El-Sawy et al. 2017a, b). Data augmentation increases the number of images used for training by applying label-preserving transformations. Also, it is applied to the training set to make the resulting model more invariant to image transformation; in this work, each image in the training dataset is transformed as follows:

Reflection around X-axis.
Reflection around Y-axis.
Reflection around the X-Y axis.

The augmentation process raises the number of images from 3750 images to 15,000 images, 3 times larger than the original dataset. This will lead to a significant improvement in the neural network training phase. Additionally, it will make the proposed DCNN immune to memorize the data and be more robust.

Testing phase

The testing phase is the phase where the proposed model proves its performance and efficiency. The main goals of the proposed model are correctly classifying the treatments based on numerical features by using machine learning algorithms and correctly classifying the treatment images of the features based on DCNN. Also, the prediction of the treatment concentration on every cell is based on numerical features and image features using both machine learning and DCNN.

For machine learning, the performance evaluation will include testing accuracy along with receiver operating characteristic (ROC) curve under 5k-fold cross-validation. For DCNN, testing accuracy, precision, recall, and F1 score (Goutte and Gaussier 2010) are included based on the calculation of the confusion matrix. The performance metrics are presented from equation (2) to equation (10).

Testing Accuracy = \frac{TruePos + TrueNeg}{(TruePos + FalsePos) + (TrueNeg + FalseNeg)}

Precision = \frac{TruePos}{(TruePos + FalsePos)}

Recall = \frac{TruePos}{(TruePos + FalseNeg)}

F 1 Score = 2 * \frac{Precision \times Recall}{(Precision + Recall)}

where TruePos is the count of true positive samples, TrueNeg is the count of true negative samples, FalsePos is the count of false positive samples, and FalseNeg is the count of false negative samples from a confusion matrix.

Experimental results

The experiments are implemented using MATLAB software on a computer server with 96 GB of RAM and Intel Xeon processor (2 GHz). The following specifications are selected during the experiments:

For machine learning algorithms
- Three classifiers are tested (support vector machine, decision trees, and ensemble).
- Two problems (treatment classification and treatment concentration prediction).
- Dataset is in numerical format.
- 5k-fold cross-validation is selected.
- Testing accuracy along with receiver operating characteristic (ROC) and area under curve (AUC) are selected as performance metrics.
For DCNN
- Using the proposed DCNN in “Training phase”.
- Two problems (treatment classification and treatment concentration prediction).
- Dataset is in digital image format.
- Dataset was divided into two sections (70% of the data for the training process and 30% for the testing process).
- Data augmentation is applied for treatment classification problems.
- Testing accuracy, precision, recall, and F1 score are selected as performance metrics.

Treatment classification results

There are 32 classes of treatment according to the subset selected from the original dataset and they are presented in Table 2. The treatment classification will be experimented on by machine learning for numerical format and DCNN for digital image format.

Table 2.

Treatment classes according to the selected dataset

1-Deoxygalactonojirimycin	Darunavir	Indinavir	Penciclovir
Aloxistatin	Dimethyl fumarate	Indomethacin	Polydatin
Arbidol	Favipiravir	Lopinavir	Quinine
CAL-101	GS-441524	Methylprednisolone-sodium-succinate	Quinine hydrochloride
Camostat	Haloperidol	Nicotianamine	Quinine-ethyl-carbonate
Chloroquine	Hydroxychloroquine Sulfate	Oseltamivir-carboxylate	Remdesivir (GS-5734)
Cobicistat	Imiquimod	Pacritinib	Ribavirin
Ritonavir	Solithromycin	Tenofovir disoproxil fumarate	Thymoquinone

Open in a new tab

The first results to be recorded are using classical machine learning, three classical machine learnings are selected, and they are DT, SVM, and ensemble. Table 3 presents the average testing accuracy for the selected machine learning algorithm using 5k cross-validation.

Table 3.

Testing accuracy using different machine learning algorithms

Family algorithm	DT	SVM	Ensemble
Child algorithm (best-achieved accuracy)	Fine-Tree (Damrongsakmethee and Neagoe 2019)	Cubic-SVM (Bagasta et al. 2019)	Subspace discriminant (Hang et al. 2015)
Average testing accuracy	57.7%	71.5%	72.7%

Open in a new tab

ROC curve is one of the performance metrics for the machine learning algorithms. An ROC curve is a graph showing the performance of a classification model at all classification thresholds using true positive rate and false positive rate. Figure 5 presents a set of ROC curves for the different machine learning algorithms for one treatment oseltamivir-carboxylate. The AUC provides an aggregate measure of performance across all possible classification thresholds. The AUC for treatment oseltamivir-carboxylate using DT was 73% while using SVM, the AUC was 84%, and using ensemble, the AUC was 86%. There are about 96 ROC curves that can be produced by experimental trails, but there is no need to repeat the figures for different treatments, and the testing accuracy can be a good indicator of the quality of the machine learning algorithm.

Fig. 5 — ROC curves for treatment oseltamivir-carboxylate using a DT, b SVM, and c ensemble

Using deep learning architecture, the achieved results are better than using machine learning algorithms in terms of testing accuracy and performance metrics. Using the proposed DCNN model and the conversion to the image domain with augmentation helped the model to achieve better results. The achieved testing accuracy was 98.05%. The recall measure was 95.03% accuracy. The precision measure was 96.52% accuracy. The F1 score measure was 95.97% accuracy. The confusion matrix is presented in Fig. 6. It is clearly shown that using a deep learning model with the conversion to image domain for features enhanced the testing accuracy by 25.35% rather than using an ensemble algorithm which achieved 72.7% testing accuracy.

Fig. 6 — Confusion matrix for the proposed DCNN model using feature images

The progress of the training phase of the proposed deep learning model is presented in Fig. 7, which reflects the advancement of the training process to achieve better accuracy; the model has tuned for early stop of the training if there is no better accuracy achieved in 10 iterations. The batch size was 32 with a learning rate of 0.0001. Examples of testing accuracy along with treatment classification are presented in Fig. 8.

Fig. 8 — Examples of the testing accuracy for treatment classification

Treatment concentration prediction results

Another goal for the proposed model is to predict the concentration of the treatment on the cell. The first direction to investigate the accuracy of the model is by using a machine-learning algorithm to predict the concentration level of treatment. Three concentration levels are investigated, and they were 10, 30, and 100% concentration level. Table 4 presents the testing accuracy of treatment concentration using DT, SVM, and ensemble algorithms using 5k cross-validation.

Table 4.

Testing accuracy using different machine learning algorithms

Family algorithm	DT	SVM	Ensemble
Child-algorithm (best-achieved accuracy)	Coarse tree (Damrongsakmethee and Neagoe 2019)	Linear SVM (Chang and Lin 2008)	Bagged tree (Banfield et al. 2006)
Average testing accuracy	96.4%	97.3%	98.5%

Open in a new tab

ROC curves and AUC are also extra indicators of the quality of the classifier. Figure 9 presents the ROC curves for the different machine learning algorithms for the different classes of the level of the treatment concentration of 10, 30, and 100. The SVM and the ensemble algorithms achieved AUC with 100% which is a good indicator for the quality of the classifier. Also, according to Table 3, both classifiers (SVM and ensemble) achieved a testing accuracy with 97.3% and 98.5% for a three-class problem.

Fig. 9 — ROC and AUC for machine learning algorithms for the treatment concentration level prediction for a 10, b 30, and c 100 treatment concentration level

The second direction is to use deep learning to solve this problem using the same proposed DCNN model for the feature of digital images without using augmentation. There was no need to use the augmentation process as the proposed model achieved a good testing accuracy with 98.2%. Figure 10 presents the confusion matrix for the level of the concentration level of the potential treatment. The proposed model with the conversion of features to images achieved 98.2% testing accuracy along with performance metrics as follows (recall: 87.42%, precision: 99.36%, and F1 score: 93.01%).

Fig. 10 — Confusion matrix for the treatment concentration level prediction

For the concentration level, 10% of the achieved accuracy was 98.1%, for the concentration level 30%, the achieved accuracy was 100%. For the concentration level of 100%, the achieved accuracy was also 100%. The achieved accuracy for every class reflects the performance of the proposed DCNN model.

Result discussion

For the treatment classification which includes 32 classes, the proposed DCNN achieved a superior result if it is compared with machine learning algorithms in terms of testing accuracy. The proposed DCNN achieved a result of 98.05% while classical machine learning such as DT, SVM, and ensemble achieved 57.7%, 71.5%, and 72.7%, respectively. The performance metrics supported the obtained results for the proposed DCNN with feature image conversion.

In the treatment concentration level prediction, the classical machine learning algorithms such as DT and SVM achieved a near result with the proposed DCNN. The DT and SVM achieved 96.4% and 97.3%, respectively, while the DCNN achieved 98.2% in testing accuracy. The ensemble algorithm achieved a superior testing accuracy rather than the DCNN and achieved 98.5%. As a general notice, the classical machine learning algorithm for simple classification problems such as treatment concentration level prediction which includes three classes. While in multiclass classification such as treatment classification which includes 32 classes, the deep learning model proved its performance and efficiency if it is compared with classical machine learning.

Conclusion and future works

The coronavirus pandemic is putting healthcare systems around the world into a critical situation. Until now, there is a cure for this virus. One of the methods that can help to defeat this virus is trying approved treatments on human cells as a primary stop to shorten the gap between treatments and finding an actual cure. Computer algorithms and deep learning can close that gap and help in finding a cure. In this paper, a deep learning model and machine learning methods for the classification of potential coronavirus treatments on a single human cell. The dataset selected in work is a subset of the publicly online dataset on RxRx.ai. The objective of this research is to automatically classify the human cell according to treatment and treatment concentration levels. The proposed DCNN model and methodology are based on converting the numerical features from the original dataset to the image domain. The proposed model consists of three convolutional layers, three ReLU layers, three pooling layers, and two fully connected layers. The experimental results showed that the proposed DCNN model for treatment classification (32 classes) achieved 98.05% testing accuracy if it is compared with classical machine learning such as support vector machine, decision tree, and ensemble. In treatment concentration level prediction, the classical machine learning (ensemble) algorithm achieved 98.5% testing accuracy while the proposed DCNN model achieved 98.2%. One of the potential future work is performing same experiments with deep transfer models such as Alexnet and Resnet50 or even deeper neural networks to investigate its performance with used dataset in this research.

Funding

This research received no external funding.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Footnotes

This article is part of the topical collection: Role of Nanotechnology and Internet of Things in Healthcare

Guest Editors: Florian Heberle, Steve bull and John Fitzgerald

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Nour Eldeen M. Khalifa, Email: nourmahmoud@cu.edu.eg

Mohamed Hamed N. Taha, Email: mnasrtaha@cu.edu.eg

Gunasekaran Manogaran, Email: gmanogaran@ieee.org.

Mohamed Loey, Email: mloey@fci.bu.edu.eg.

References

Ardakani AA, Kanafi AR, Acharya UR, Khadem N, Mohammadi A. Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: results of 10 convolutional neural networks. Comput Biol Med. 2020;121:103795. doi: 10.1016/j.compbiomed.2020.103795. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bagasta AR, Rustam Z, Pandelaki J, Nugroho WA (2019) Comparison of cubic SVM with Gaussian SVM: classification of infarction for detecting ischemic stroke, in IOP Conference Series: Materials Science and Engineering, vol. 546, no. 5, p. 052016
Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP. A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell. 2006;29(1):173–180. doi: 10.1109/TPAMI.2007.250609. [DOI] [PubMed] [Google Scholar]
Çayir A, Yenidoğan I, Dağ H (2018) Feature extraction based on deep learning for some traditional machine learning methods, in 2018 3rd International Conference on Computer Science and Engineering (UBMK), 2018, pp. 494–497, 10.1109/UBMK.2018.8566383
Chamola V, Hassija V, Gupta V, Guizani M. A comprehensive review of the COVID-19 pandemic and the role of IoT, drones, AI, blockchain, and 5G in managing its impact. IEEE Access. 2020;8:90225–90265. doi: 10.1109/ACCESS.2020.2992341. [DOI] [Google Scholar]
Chang Y-W, Lin C-J. Causation and prediction challenge. 2008. Feature ranking using linear SVM; pp. 53–64. [Google Scholar]
Chang L, Yan Y, Wang L (2020) Coronavirus disease 2019: coronaviruses and blood safety. Transfus Med Rev. 10.1016/j.tmrv.2020.02.003 [DOI] [PMC free article] [PubMed]
Civit-Masot J, Luna-Perejón F, Domínguez Morales M, Civit A. Deep learning system for COVID-19 diagnosis aid using X-ray pulmonary images. Appl Sci. 2020;10(13):13. doi: 10.3390/app10134640. [DOI] [Google Scholar]
Coronavirus (COVID-19) map (2020). https://www.google.com/covid19-map/ (accessed Apr. 26, 2020)
Damrongsakmethee T, Neagoe V-E (2019) Principal component analysis and relieff cascaded with decision tree for credit scoring, in Computer Science On-line Conference, pp. 85–95
El-Sawy A, Loey M, EL-Bakry H (2017a) Arabic handwritten characters recognition using convolutional neural network. WSEAS Trans Comput Res 5 Accessed: Apr. 01, 2020. [Online]. Available: http://www.wseas.org/multimedia/journals/computerresearch/2017/a045818-075.php
El-Sawy A, El-Bakry H, Loey M (2017b) CNN for handwritten Arabic digits recognition based on LeNet-5 BT - Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, Cham, pp. 566–575
Giovanetti M, Benvenuto D, Angeletti S, Ciccozzi M. The first two cases of 2019-nCoV in Italy: where they come from? J Med Virol. 2020;92(5):518–521. doi: 10.1002/jmv.25699. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goutte C, Gaussier E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. 2010. [Google Scholar]
Hang R, Liu Q, Song H, Sun Y. Matrix-based discriminant subspace ensemble for hyperspectral image spatial–spectral feature fusion. IEEE Trans Geosci Remote Sens. 2015;54(2):783–794. doi: 10.1109/TGRS.2015.2465899. [DOI] [Google Scholar]
Heiser K et al (2020) Identification of potential treatments for COVID-19 through artificial intelligence-enabled phenomic analysis of human cells infected with SARS-CoV-2. bioRxiv
Holshue ML, DeBolt C, Lindquist S, Lofy KH, Wiesman J, Bruce H, Spitters C, Ericson K, Wilkerson S, Tural A, Diaz G, Cohn A, Fox L, Patel A, Gerber SI, Kim L, Tong S, Lu X, Lindstrom S, Pallansch MA, Weldon WC, Biggs HM, Uyeki TM, Pillai SK, Washington State 2019-nCoV Case Investigation Team First case of 2019 novel coronavirus in the United States. N Engl J Med. 2020;382(10):929–936. doi: 10.1056/NEJMoa2001191. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jogin M, Mohana, Madhulika MS, Divya GD, Meghana RK, Apoorva S (2018) Feature extraction using convolution neural networks (CNN) and deep learning, in 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information Communication Technology (RTEICT), pp. 2319–2323, 10.1109/RTEICT42901.2018.9012507
Khalifa NEM, Taha MHN, Hassanien AE. International Conference on Advanced Intelligent Systems and Informatics. 2018. Aquarium family fish species identification system using deep neural networks; pp. 347–356. [Google Scholar]
Khalifa N, Loey M, Taha M, Mohamed H. Deep transfer learning models for medical diabetic retinopathy detection. Acta Inform Med. 2019;27(5):327. doi: 10.5455/aim.2019.27.327-332. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khalifa NEM, Taha MHN, Hassanien AE, Hemedan AA. Deep bacteria: robust deep learning data augmentation design for limited bacterial colony dataset. Int J Reason Based Intell Syst. 2019;11(3):256–264. [Google Scholar]
Khalifa NEM, Taha MHN, Ali DE, Slowik A, Hassanien AE. Artificial intelligence technique for gene expression by tumor RNA-Seq data: a novel optimized deep learning approach. IEEE Access. 2020;8:22874–22883. doi: 10.1109/ACCESS.2020.2970210. [DOI] [Google Scholar]
Khattar A, Jain PR, Quadri SMK (2020) Effects of the disastrous pandemic COVID 19 on learning styles, activities and mental health of young Indian students - a machine learning approach, in 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1190–1195, 10.1109/ICICCS48265.2020.9120955
Kleinbaum DG, Klein M (2002) Logistic regression: a self-learning text, 2nd edn. Springer-Verlag, New York
Lai C-C, Shih T-P, Ko W-C, Tang H-J, Hsueh P-R. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): the epidemic and the challenges. Int J Antimicrob Agents. 2020;55(3):105924. doi: 10.1016/j.ijantimicag.2020.105924. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li J, Li J(J), Xie X, Cai X, Huang J, Tian X, Zhu H. Game consumption and the 2019 novel coronavirus. Lancet Infect Dis. 2020;20(3):275–276. doi: 10.1016/S1473-3099(20)30063-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Loey M, Smarandache F, Khalifa NEM. Within the lack of chest COVID-19 X-ray dataset: a novel detection model based on GAN and deep transfer learning. Symmetry. 2020;12(4):4. doi: 10.3390/sym12040651. [DOI] [Google Scholar]
Loey M, ElSawy A, Afify M (2020b) Deep learning in plant diseases detection for agricultural crops: a survey. Int J Serv Sci Manag Eng Technol (IJSSMET) www.igi-global.com/article/deep-learning-in-plant-diseases-detection-for-agricultural-crops/248499 (accessed Apr. 11, 2020)
Loey M, Naman MR, Zayed HH (2020c) A survey on blood image diseases detection using deep learning. Int J Serv Sci Manag Eng Technol (IJSSMET) www.igi-global.com/article/a-survey-on-blood-image-diseases-detection-using-deep-learning/256653 (accessed Jun. 17, 2020)
Loey M, Naman M, Zayed H. Deep transfer learning in diagnosing leukemia in blood cells. Computers. 2020;9(2):2. doi: 10.3390/computers9020029. [DOI] [Google Scholar]
Mangalova E, Agafonov E. Wind power forecasting using the k-nearest neighbors algorithm. Int J Forecast. 2014;30(2):402–406. doi: 10.1016/j.ijforecast.2013.07.008. [DOI] [Google Scholar]
Narayan Das N, Kumar N, Kaur M, Kumar V, Singh D (2020, IRBM) Automated deep transfer learning-based approach for detection of COVID-19 infection in chest X-rays. 10.1016/j.irbm.2020.07.001 [DOI] [PMC free article] [PubMed]
Naseem I, Togneri R, Bennamoun M. Linear regression for face recognition. IEEE Trans Pattern Anal Mach Intell. 2010;32(11):2106–2112. doi: 10.1109/TPAMI.2010.128. [DOI] [PubMed] [Google Scholar]
Navada A, Ansari AN, Patil S, Sonkamble BA (2011) Overview of use of decision tree algorithms in machine learning, in 2011 IEEE Control and System Graduate Research Colloquium, pp. 37–42, 10.1109/ICSGRC.2011.5991826
Polikar R. Ensemble learning. In: Zhang C, Ma Y, editors. Ensemble machine learning: methods and applications. Boston, MA: Springer US; 2012. pp. 1–34. [Google Scholar]
Rothe C, Schunk M, Sothmann P, Bretzel G, Froeschl G, Wallrauch C, Zimmer T, Thiel V, Janke C, Guggemos W, Seilmaier M, Drosten C, Vollmar P, Zwirglmaier K, Zange S, Wölfel R, Hoelscher M. Transmission of 2019-nCoV infection from an asymptomatic contact in Germany. N Engl J Med. 2020;382(10):970–971. doi: 10.1056/NEJMc2001468. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sharfstein JM, Becker SJ, Mello MM (2020) Diagnostic testing for the novel coronavirus. JAMA. 10.1001/jama.2020.3864 [DOI] [PubMed]
Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):60. doi: 10.1186/s40537-019-0197-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Singhal T. A review of coronavirus disease-2019 (COVID-19) Indian J Pediatr. 2020;87(4):281–286. doi: 10.1007/s12098-020-03263-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tu P-L, Chung J-Y (1992) A new decision-tree classification algorithm for machine learning, in Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI ‘92, pp. 370–377, 10.1109/TAI.1992.246431
Waheed A, Goyal M, Gupta D, Khanna A, Al-Turjman F, Pinheiro PR. CovidGAN: data augmentation using auxiliary classifier GAN for improved Covid-19 detection. IEEE Access. 2020;8:91916–91923. doi: 10.1109/ACCESS.2020.2994762. [DOI] [PMC free article] [PubMed] [Google Scholar]
Worldometer (2020) Countries where Coronavirus has spread – Worldometer. https://www.worldometers.info/coronavirus/countries-where-coronavirus-has-spread/ (accessed Jul. 10, 2020)
Xiao Y, Wu J, Lin Z, Zhao X. A deep learning-based multi-model ensemble method for cancer prediction. Comput Methods Prog Biomed. 2018;153:1–9. doi: 10.1016/j.cmpb.2017.09.005. [DOI] [PubMed] [Google Scholar]

[CR1] Ardakani AA, Kanafi AR, Acharya UR, Khadem N, Mohammadi A. Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: results of 10 convolutional neural networks. Comput Biol Med. 2020;121:103795. doi: 10.1016/j.compbiomed.2020.103795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] Bagasta AR, Rustam Z, Pandelaki J, Nugroho WA (2019) Comparison of cubic SVM with Gaussian SVM: classification of infarction for detecting ischemic stroke, in IOP Conference Series: Materials Science and Engineering, vol. 546, no. 5, p. 052016

[CR3] Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP. A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell. 2006;29(1):173–180. doi: 10.1109/TPAMI.2007.250609. [DOI] [PubMed] [Google Scholar]

[CR4] Çayir A, Yenidoğan I, Dağ H (2018) Feature extraction based on deep learning for some traditional machine learning methods, in 2018 3rd International Conference on Computer Science and Engineering (UBMK), 2018, pp. 494–497, 10.1109/UBMK.2018.8566383

[CR5] Chamola V, Hassija V, Gupta V, Guizani M. A comprehensive review of the COVID-19 pandemic and the role of IoT, drones, AI, blockchain, and 5G in managing its impact. IEEE Access. 2020;8:90225–90265. doi: 10.1109/ACCESS.2020.2992341. [DOI] [Google Scholar]

[CR6] Chang Y-W, Lin C-J. Causation and prediction challenge. 2008. Feature ranking using linear SVM; pp. 53–64. [Google Scholar]

[CR7] Chang L, Yan Y, Wang L (2020) Coronavirus disease 2019: coronaviruses and blood safety. Transfus Med Rev. 10.1016/j.tmrv.2020.02.003 [DOI] [PMC free article] [PubMed]

[CR8] Civit-Masot J, Luna-Perejón F, Domínguez Morales M, Civit A. Deep learning system for COVID-19 diagnosis aid using X-ray pulmonary images. Appl Sci. 2020;10(13):13. doi: 10.3390/app10134640. [DOI] [Google Scholar]

[CR9] Coronavirus (COVID-19) map (2020). https://www.google.com/covid19-map/ (accessed Apr. 26, 2020)

[CR10] Damrongsakmethee T, Neagoe V-E (2019) Principal component analysis and relieff cascaded with decision tree for credit scoring, in Computer Science On-line Conference, pp. 85–95

[CR11] El-Sawy A, Loey M, EL-Bakry H (2017a) Arabic handwritten characters recognition using convolutional neural network. WSEAS Trans Comput Res 5 Accessed: Apr. 01, 2020. [Online]. Available: http://www.wseas.org/multimedia/journals/computerresearch/2017/a045818-075.php

[CR12] El-Sawy A, El-Bakry H, Loey M (2017b) CNN for handwritten Arabic digits recognition based on LeNet-5 BT - Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, Cham, pp. 566–575

[CR13] Giovanetti M, Benvenuto D, Angeletti S, Ciccozzi M. The first two cases of 2019-nCoV in Italy: where they come from? J Med Virol. 2020;92(5):518–521. doi: 10.1002/jmv.25699. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] Goutte C, Gaussier E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. 2010. [Google Scholar]

[CR15] Hang R, Liu Q, Song H, Sun Y. Matrix-based discriminant subspace ensemble for hyperspectral image spatial–spectral feature fusion. IEEE Trans Geosci Remote Sens. 2015;54(2):783–794. doi: 10.1109/TGRS.2015.2465899. [DOI] [Google Scholar]

[CR16] Heiser K et al (2020) Identification of potential treatments for COVID-19 through artificial intelligence-enabled phenomic analysis of human cells infected with SARS-CoV-2. bioRxiv

[CR17] Holshue ML, DeBolt C, Lindquist S, Lofy KH, Wiesman J, Bruce H, Spitters C, Ericson K, Wilkerson S, Tural A, Diaz G, Cohn A, Fox L, Patel A, Gerber SI, Kim L, Tong S, Lu X, Lindstrom S, Pallansch MA, Weldon WC, Biggs HM, Uyeki TM, Pillai SK, Washington State 2019-nCoV Case Investigation Team First case of 2019 novel coronavirus in the United States. N Engl J Med. 2020;382(10):929–936. doi: 10.1056/NEJMoa2001191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] Jogin M, Mohana, Madhulika MS, Divya GD, Meghana RK, Apoorva S (2018) Feature extraction using convolution neural networks (CNN) and deep learning, in 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information Communication Technology (RTEICT), pp. 2319–2323, 10.1109/RTEICT42901.2018.9012507

[CR19] Khalifa NEM, Taha MHN, Hassanien AE. International Conference on Advanced Intelligent Systems and Informatics. 2018. Aquarium family fish species identification system using deep neural networks; pp. 347–356. [Google Scholar]

[CR20] Khalifa N, Loey M, Taha M, Mohamed H. Deep transfer learning models for medical diabetic retinopathy detection. Acta Inform Med. 2019;27(5):327. doi: 10.5455/aim.2019.27.327-332. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] Khalifa NEM, Taha MHN, Hassanien AE, Hemedan AA. Deep bacteria: robust deep learning data augmentation design for limited bacterial colony dataset. Int J Reason Based Intell Syst. 2019;11(3):256–264. [Google Scholar]

[CR22] Khalifa NEM, Taha MHN, Ali DE, Slowik A, Hassanien AE. Artificial intelligence technique for gene expression by tumor RNA-Seq data: a novel optimized deep learning approach. IEEE Access. 2020;8:22874–22883. doi: 10.1109/ACCESS.2020.2970210. [DOI] [Google Scholar]

[CR23] Khattar A, Jain PR, Quadri SMK (2020) Effects of the disastrous pandemic COVID 19 on learning styles, activities and mental health of young Indian students - a machine learning approach, in 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1190–1195, 10.1109/ICICCS48265.2020.9120955

[CR24] Kleinbaum DG, Klein M (2002) Logistic regression: a self-learning text, 2nd edn. Springer-Verlag, New York

[CR25] Lai C-C, Shih T-P, Ko W-C, Tang H-J, Hsueh P-R. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): the epidemic and the challenges. Int J Antimicrob Agents. 2020;55(3):105924. doi: 10.1016/j.ijantimicag.2020.105924. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] Li J, Li J(J), Xie X, Cai X, Huang J, Tian X, Zhu H. Game consumption and the 2019 novel coronavirus. Lancet Infect Dis. 2020;20(3):275–276. doi: 10.1016/S1473-3099(20)30063-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] Loey M, Smarandache F, Khalifa NEM. Within the lack of chest COVID-19 X-ray dataset: a novel detection model based on GAN and deep transfer learning. Symmetry. 2020;12(4):4. doi: 10.3390/sym12040651. [DOI] [Google Scholar]

[CR28] Loey M, ElSawy A, Afify M (2020b) Deep learning in plant diseases detection for agricultural crops: a survey. Int J Serv Sci Manag Eng Technol (IJSSMET) www.igi-global.com/article/deep-learning-in-plant-diseases-detection-for-agricultural-crops/248499 (accessed Apr. 11, 2020)

[CR29] Loey M, Naman MR, Zayed HH (2020c) A survey on blood image diseases detection using deep learning. Int J Serv Sci Manag Eng Technol (IJSSMET) www.igi-global.com/article/a-survey-on-blood-image-diseases-detection-using-deep-learning/256653 (accessed Jun. 17, 2020)

[CR30] Loey M, Naman M, Zayed H. Deep transfer learning in diagnosing leukemia in blood cells. Computers. 2020;9(2):2. doi: 10.3390/computers9020029. [DOI] [Google Scholar]

[CR31] Mangalova E, Agafonov E. Wind power forecasting using the k-nearest neighbors algorithm. Int J Forecast. 2014;30(2):402–406. doi: 10.1016/j.ijforecast.2013.07.008. [DOI] [Google Scholar]

[CR32] Narayan Das N, Kumar N, Kaur M, Kumar V, Singh D (2020, IRBM) Automated deep transfer learning-based approach for detection of COVID-19 infection in chest X-rays. 10.1016/j.irbm.2020.07.001 [DOI] [PMC free article] [PubMed]

[CR33] Naseem I, Togneri R, Bennamoun M. Linear regression for face recognition. IEEE Trans Pattern Anal Mach Intell. 2010;32(11):2106–2112. doi: 10.1109/TPAMI.2010.128. [DOI] [PubMed] [Google Scholar]

[CR34] Navada A, Ansari AN, Patil S, Sonkamble BA (2011) Overview of use of decision tree algorithms in machine learning, in 2011 IEEE Control and System Graduate Research Colloquium, pp. 37–42, 10.1109/ICSGRC.2011.5991826

[CR35] Polikar R. Ensemble learning. In: Zhang C, Ma Y, editors. Ensemble machine learning: methods and applications. Boston, MA: Springer US; 2012. pp. 1–34. [Google Scholar]

[CR36] Rothe C, Schunk M, Sothmann P, Bretzel G, Froeschl G, Wallrauch C, Zimmer T, Thiel V, Janke C, Guggemos W, Seilmaier M, Drosten C, Vollmar P, Zwirglmaier K, Zange S, Wölfel R, Hoelscher M. Transmission of 2019-nCoV infection from an asymptomatic contact in Germany. N Engl J Med. 2020;382(10):970–971. doi: 10.1056/NEJMc2001468. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] Sharfstein JM, Becker SJ, Mello MM (2020) Diagnostic testing for the novel coronavirus. JAMA. 10.1001/jama.2020.3864 [DOI] [PubMed]

[CR38] Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):60. doi: 10.1186/s40537-019-0197-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] Singhal T. A review of coronavirus disease-2019 (COVID-19) Indian J Pediatr. 2020;87(4):281–286. doi: 10.1007/s12098-020-03263-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] Tu P-L, Chung J-Y (1992) A new decision-tree classification algorithm for machine learning, in Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI ‘92, pp. 370–377, 10.1109/TAI.1992.246431

[CR41] Waheed A, Goyal M, Gupta D, Khanna A, Al-Turjman F, Pinheiro PR. CovidGAN: data augmentation using auxiliary classifier GAN for improved Covid-19 detection. IEEE Access. 2020;8:91916–91923. doi: 10.1109/ACCESS.2020.2994762. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] Worldometer (2020) Countries where Coronavirus has spread – Worldometer. https://www.worldometers.info/coronavirus/countries-where-coronavirus-has-spread/ (accessed Jul. 10, 2020)

[CR43] Xiao Y, Wu J, Lin Z, Zhao X. A deep learning-based multi-model ensemble method for cancer prediction. Comput Methods Prog Biomed. 2018;153:1–9. doi: 10.1016/j.cmpb.2017.09.005. [DOI] [PubMed] [Google Scholar]

PERMALINK

This article has been retracted.

A deep learning model and machine learning methods for the classification of potential coronavirus treatments on a single human cell

Nour Eldeen M Khalifa

Mohamed Hamed N Taha

Gunasekaran Manogaran

Mohamed Loey

Abstract

Introduction

Fig. 1.

Datasets characteristics

Table 1.

The proposed model

Fig. 2.

Preprocessing phase

Fig. 3.

Training phase

Support vector machine

Decision tree

Ensemble methods

Deep convolutional neural networks

Fig. 4.

Testing phase

Experimental results

Treatment classification results

Table 2.

Table 3.

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

Treatment concentration prediction results

Table 4.

Fig. 9.

Fig. 10.

Result discussion

Conclusion and future works

Funding

Compliance with ethical standards

Conflict of interest

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases