Skip to main content
. 2021 Feb 17:bbaa420. doi: 10.1093/bib/bbaa420

Table 2.

Data science tools and techniques for SARS-CoV-2 data analysis

Task Data type Data science models Available tools
Phylogeny/ alignment Nucleotide/protein sequence UPGMA, WPGMA, neighbour-joining, maximum likelihood, Fitch–Margoliash method, maximum parsimony, Bayesian inference ClustalW, ClustalInline graphic, MAFFT, MUSCLE, T-Coffee https://www.ebi.ac.uk/Tools/  https://www.genome.jp/tools-bin/clustalw DNAMAN https://www.lynnon.com/dnaman.html
Structure prediction Protein sequence Deep neural network (NeBcon, ResPRE, ResTriplet and TripletRes), QSQE, supervised machine learning (SVM), multiple regression SWISS-MODEL [79], PyMOL [80], I-Tasser [81], COMPOSER [78]
SARS-CoV-2 predictor Nucleotide sequence Conventional models (Naïve Bayes, K-nearest neighbors, artificial neural networks, decision tree and support vector machine), deep models CNN, Bi-path CNN (BiPathCNN) COVID-Predictor [132]
Protein interactions Protein sequence, PPI networks, protein structure Graph analysis Cytoscape https://apps.cytoscape.org/
Chest imaging analysis Chest x-ray or CT image Deep learning models (VGG19), Mobile Net, Inception, Xception and Inception ResNet (v2,18,23,50), GAN, Dice similarity coefficient (DSC) TrainingData.io https://www.trainingdata.io/
Epidemic trend analysis Experimental and observational LSTM statistical models (SIR, Bayesian imputation, linear and polynomial regression) Worldometers-coronavirus https://www.worldometers.info/coronavirus/ WHO-COVID19-report https://www.who.int/emergencies/diseases/novel-coronavirus-2019 COVID-19 Projections https://covid19-projections.com/
Drug interaction and repurposing Protein sequence, drug molecules, protein structure Graph analysis, graphical convolution network DeepDR [133], kGCN [134], DeepChem [135], D3Targets-2019-nCoV [136], CoVex [137]