Table 2.
Data science tools and techniques for SARS-CoV-2 data analysis
Task | Data type | Data science models | Available tools |
---|---|---|---|
Phylogeny/ alignment | Nucleotide/protein sequence | UPGMA, WPGMA, neighbour-joining, maximum likelihood, Fitch–Margoliash method, maximum parsimony, Bayesian inference | ClustalW, Clustal![]() |
Structure prediction | Protein sequence | Deep neural network (NeBcon, ResPRE, ResTriplet and TripletRes), QSQE, supervised machine learning (SVM), multiple regression | SWISS-MODEL [79], PyMOL [80], I-Tasser [81], COMPOSER [78] |
SARS-CoV-2 predictor | Nucleotide sequence | Conventional models (Naïve Bayes, K-nearest neighbors, artificial neural networks, decision tree and support vector machine), deep models CNN, Bi-path CNN (BiPathCNN) | COVID-Predictor [132] |
Protein interactions | Protein sequence, PPI networks, protein structure | Graph analysis | Cytoscape https://apps.cytoscape.org/ |
Chest imaging analysis | Chest x-ray or CT image | Deep learning models (VGG19), Mobile Net, Inception, Xception and Inception ResNet (v2,18,23,50), GAN, Dice similarity coefficient (DSC) | TrainingData.io https://www.trainingdata.io/ |
Epidemic trend analysis | Experimental and observational | LSTM statistical models (SIR, Bayesian imputation, linear and polynomial regression) | Worldometers-coronavirus https://www.worldometers.info/coronavirus/ WHO-COVID19-report https://www.who.int/emergencies/diseases/novel-coronavirus-2019 COVID-19 Projections https://covid19-projections.com/ |
Drug interaction and repurposing | Protein sequence, drug molecules, protein structure | Graph analysis, graphical convolution network | DeepDR [133], kGCN [134], DeepChem [135], D3Targets-2019-nCoV [136], CoVex [137] |