Skip to main content
. 2021 Mar 12;22(6):2903. doi: 10.3390/ijms22062903

Table 1.

Representative problems and methods addressing them by incorporating machine learning (ML) with bioinformatics tools in four areas.

Bioinformatics Area Problem Category Goal ML Method Bioinformatic Tools
Molecular evolution Biological sequence clustering Protein family prediction CNN Clusters of Orthologous Groups (COGs) and G protein-coupled receptor (GPCR) dataset [30]
Protein function prediction deep RNN BLAST and HMMER search [32]
Anti-CRISPR proteins identification Random forest MSA and PSI-BLAST [24]
EXtreme Gradient Boosting K-mer based clustering (CD-HIT), BLAST [25]
Viral pathogenicity feature identification SVM MSA, phylogenetic tree construction [20,21]
Alignment free biological sequence analysis Identification of viral genomes RNN BLAST, Sequence clustering, HHPRED [27]
CNN BLAST [28]
protein structure analysis Post translational modifications Phosphorylation sites prediction KNN Local sequence similarity [53]
CNN K-mer based clustering (CD-HIT), BLAST [55]
Glycosylation sites prediction ensemble SVM curated glycosylated protein database (O-GLYCBASE) [54]
Protein structure prediction Protein contact prediction CNN MSA [72]
Prediction of distances between pairs of residues CNN MSA, HHPRED, PSI-BLAST [77]
systems biology inference of biological networks Gene regulatory network prediction SVM GeneNetWeaver, RegulonDB [81]
Protein-protein interaction network prediction SVM Domain affinity and frequency tables [90]
Elastic-net regression Protein descriptors [91]
Analysis of biological networks Drug target prediction K-means Network analysis tools [98]
Drug side effect prediction SVM Genome scale metabolic modeling [112]
Drug Synergism prediction Random Forest Ensemble A chemical-genetic interaction matrix [117]
Multi-omics integration Cancer subtype prediction Neighborhood based clustering Similarity based integration [141]
Drug response prediction logistic regression Cancer hallmarks datasets, pathway data [144]
biomarker analysis for disease research Disease-associated genes investigation Pulmonary sarcoidosis genes identification Hierarchical clustering Differential expression analysis [150]
Identification of miRNA-disease association NMF Disease semantic information and miRNA functional information [151]
Disease-phenotype visualization t-SNE OMIM database and human disease networks [154]
Biomarker discovery Cancer diagnosis SVM Reference gene selection [170]
Biomarker signature identification SVM Network-based gene selection [167]
Cancer outcome prediction Random forest Evolutionary conservation estimation [181]