TABLE 1.
Features of recent methodologies and techniques used for AI based systems biology approaches in multi-omics data analysis of cancer.
Methodology | Techniques | Characteristics | Specialty | Cancer types | Omics data | Outcome | Performance | References |
Unsupervised and supervised | Stacked autoencoder and hierarchical integration deep flexible neural forest network (HI-DFN Forest) | Autoencoders are used to integrate multi-omics data. HI-DFN Forest is used for classification | Considers intrinsic statistical properties and learns high-level representations of each omics data. HI-DFNForest model is suitable for small-scale data. | Breast, glioblastoma, ovarian cancer | mRNA expression, miRNA expression, methylation | Classify cancer subtypes | Accuracy: 0.885 (glioblastoma multiforme) | (33) |
Supervised and unsupervised | Deep-learning, autoencoder | Autoencoder was used to reduce data and then SVM to find sub-groups. | Predicts survival subgroups and aggregates genes belonging to similar pathways. | Hepatocellular carcinoma | mRNA expression, miRNA expression, methylation | Predict survival subgroups | Concordance index: 0.68 | (34, 35) |
Unsupervised | Combinations of autoencoders | Data integration based on four types of variational autoencoders (VAE) | All VAE architectures perform well. Learned representations coupled with SVMs provides best prediction. | Breast cancer | mRNA expression, CNV data | Focused on data integration approaches | Accuracy: 0.858 | (37) |
Kernel framework | Multiple kernel learning | Combine several kernels to one meta-kernel in an unsupervised framework. | Identifies cancer subtypes and provides relationships between them | Breast cancer | mRNA expression, miRNA expression, methylation | Proposed generic approach of data integration | Average cluster purity: 0.70 | (39) |
Unsupervised | Autoencoder | multi-modal sparse denoising autoencoder framework coupled with sparse non-negative matrix factorization | Illustrate impact of individual omics feature on pathway score. | Colorectal cancer, lung squamous cell carcinoma, glioblastoma multiforme and breast cancer | mRNA expression, miRNA expression, methylation, CNV | Cluster patients and provide feature pathways for patient clusters | Consensus silhouette index: 0.98 (colorectal cancer) | (42) |
Supervised and unsupervised | Random forest, SCVM | Combined use of random forest and SVM | Classifies normal and cancer samples across different tissue types and hence useful for diagnosis | 9 types of cancers | Pan-cancer mRNA expression, | Classification and identifies biomarkers | Accuracy: 97.89% (non-specific tissue type) | (43) |
Unsupervised | Autoencoder | Three types of integration approaches used. Feature combinations with highest average predictive accuracy was used. | Auto-encoder based classification | Neuroblastoma | mRNA expression, CNV | Prognosis sub-group | p-value from Kaplan-Meier curves for overall survival: 2.8e-8 | (44) |
Supervised and unsupervised | Integrative network fusion network and deep learning | Random forest was trained by two types of integrated omics data. Classifier was used based on intersection of two training processes. | Two approaches followed for data integration, juxtaposed and integration by similarity network fusion. | Neuroblastoma | mRNA expression, CNV | Prognosis sub-group | p-value for Kaplan-Meier plot: 5.7e-4 | (45) |
Supervised and unsupervised | SVM, and random forest | Initial supervised analysis was followed by systems biology approach and random forest based analysis | Multi-omics data was integrated in multiple steps with removal of redundant features. | Colorectal cancer | mRNA expression, miRNA expression, CNV, metabolomics | Identifies markers, pathways associated with cancer relapse | p-value from Kaplan-Meier curves for overall survival: 5.7e-4 | (46) |
Multi-view learning | Min-Redundancy and Max-Relevance (MRMR) | Finds features having maximum relevance in feature selection and minimum redundancy with already selected features | Two stage feature selection framework | Ovarian cancer | mRNA expression, methylation, CNV | Identifies biomarkers for predicting survival. | Area under curve (AUC): 0.7 for random forest classifier | (47) |
Neural network | Deep learning based neural network | Instead of gene expression data, eigengene modules of gene co-expression analysis were used as features. | Associates feature genes with metadata like age | Breast cancer | mRNA expression, miRNA expression, methylation, CNV and other metadata | Survival prediction | Mean concordance index: 0.6813 | (48) |
LASSO and neural network | Deep learning framework and lasso | Use group LASSO and deep neural network for data integration and then Cox model for survival prediction | Different features from same gene are grouped together | Pan-cancer | mRNA expression, CNV, SNP | Survival prediction | Concordance index: 0.8 | (49) |
Kernel method | Kernel alignment assessment of omic similarity matrix | Omic similarity matrix was constructed for each omics data and similarity between them was measured. | Considers involvement of large number of biomarkers in disease prognosis | Pan-cancer | mRNA expression, miRNA expression, methylation, CNV, SNP | Variation in prognosis assessment across cancer types | Concordance index >0.68 (sample size = 900) | (50) |
Kernel based and feature-selection based | Bayesian efficient multiple kernel learning (BEMKL) model | Kernalized regression which works on similarities between cell lines | Reduces number of model parameters to match number of samples, not feature numbers. Extract non-linear relations between features and drug response. | Breast cancer cell lines | mRNA expression, CNV, methylation, SNP, proteomic | Drug-response prediction | False discovery rate: 2.5e-5 | (51, 52, 57) |
Deep neural network, transfer learning | Multi-Omics Late Integration (MOLI) | Creates feature space for each omics data. Learned features are integrated by concatenation and used for prediction of drug response. Use transfer learning by using responses of all drugs for same target while training. | Considers unique distribution for each omics data. | Pan-cancer | mRNA expression, CNV, SNP | Predicts drug response | Accuracy: 0.8 for drug cetuximab | (53) |
Supervised | SVM and leave-one-out cross-validation (LOOCV) | Finds features from each omics data and then identifies marker candidates based on miRNA and mRNA interactions | Analyzed integrated mRNA and miRNA expression data considering their interactions | Pancreatic ductal carcinoma | MRNA expression, miRNA expression | Identify mRNA and miRNA markers. Predicts miRNA expression level | AUC: 0.925 for miR-21 as multi-marker | (54) |
Supervised | idTRAX | Finds target kinases from the compound data of all genes | Identifies kinases as effective targets of drugs | Breast cancer | Genomic and transcriptomic | Cell-model selective anti-cancer drug target | Spearman correlation ∼0.1 | (55) |
Supervised | Capsule network based modeling (CapsNetMMD) | Multi-omics data is integrated to form feature matrix and converted to capsule layers by convolution. | Supervised classification is done based on known breast cancer genes | Breast cancer | mRNA expression, methylation, CNV | Therapeutic target genes of breast cancer | p-value: 3.6e-141 (rank cut-off: 20%) | (56) |
Supervised | Random forest and different classifiers | Features were extracted based on shrunken centroid and random forest based algorithm. Different classifiers were used. | Considers methylation patterns. Distinguishes early and late stages of cancer. | Papillary renal cell carcinoma | mRNA expression, methylation | Finds driver genes | Accuracy: 84.6% for random forest | (58) |
Semi-supervised | PLATYPUS | After training on labeled data, it co-trains with unlabeled data considering the messing data. | Important features are linked to drug sensitivity | Pan-cancer cell lines | mRNA expression, CNV, SNP | Predicts drug response | AUC: 0.9 | (59) |