Table 1:
Method name | Strategy | Main advantages | Main limitations | Citation |
---|---|---|---|---|
MDI | Bayesian Consensus Clustering | Identifies gene clusters across datasets with specific shared characteristics. Can model time-series data | Limited to querying a small subset of genes. Trained only on array data | [49] |
RIMBANET | Bayesian MCMC | Integrates many data types simultaneously | Requires large quantities of multimodal data. Method was specifically designed for experiment | [50] |
EPIP | Ensemble boosting | Effective in unbalanced datasets | Limitations of training data reduce model effectiveness in small datasets | [44] |
EAGLE | Ensemble boosting | Uses higher-level features to buffer against overfitting | Custom genome-specific features need to be calculated for classification | [51] |
PreSTIGE | Information theory | Outputs different specificity thresholds | Biased to cell type | [52] |
TEPIC | Machine learning | Feature space improves result interpretability | Limited performance in gene-dense regions or with small sample sizes | [45] |
iOmicsPASS | Network analysis | Produces a sparse set of easily interpretable biological interactions. Effective in heterogeneous datasets | Important markers that are poorly represented in biological networks can be lost in the analysis | [53] |
LemonTree | Network analysis; Gibbs sampler; decision tree | Modular model parts for different cases | Trained on cancer data | [46] |
PANDA | Network analysis; message passing | Accounts for lack of direct regulatory element interaction | Choice of convergence parameter affects results. Results may be difficult to interpret | [54] |
PARADIGM | Network analysis; Probabilistic Graph Model | Robust to false-positive results | Training was performed on microarray data. Effectiveness in sequencing data unknown. Trained on cancer data | [48] |
IM-PET | Random forest classifier | Expected to generalize to other species | Requires assembly of 4 manually derived scores | [55] |
JEME | Random forest classifier; regression | Easily retrainable on different systems if sufficient data are available | At least 4 input data types are required | [56] |
RIPPLE | Random forest classifier; regression | Generalizable to other biological conditions and cell types | Assumes balanced data categories | [57] |
SVM-MAP | Support Vector Machine | Expected to generalize to multiple cancer types | Limited enhancer coverage in training data | [58] |
ELMER | Wilcoxon rank-sum test | Identifies upstream master regulators | Restricted to methylation arrays in cancer | [47] |
TENET | Wilcoxon rank-sum test | Expected to generalize to other biological systems | Targets group expression differences only | [59] |
RegNetDriver | Wilcoxon rank-sum test | Provides a framework to construct tissue-specific regulatory networks | Requires assembly of multiple manually derived scores from system-specific steps | [60] |
Names, strategies, advantages, and limitations of each method is provided. Regarding advantages and limitations, a few major points were highlighted, and it is important to note that many of these methods are highly nuanced. A citation for reference to the original manuscript of each method is provided where full details can be obtained.