Skip to main content
. 2020 Jun 16;9(6):giaa064. doi: 10.1093/gigascience/giaa064

Table 1:

Inter-modality data harmonization approaches with a restricted modality scope

Method name Strategy Main advantages Main limitations Citation
MDI Bayesian Consensus Clustering Identifies gene clusters across datasets with specific shared characteristics. Can model time-series data Limited to querying a small subset of genes. Trained only on array data [49]
RIMBANET Bayesian MCMC Integrates many data types simultaneously Requires large quantities of multimodal data. Method was specifically designed for experiment [50]
EPIP Ensemble boosting Effective in unbalanced datasets Limitations of training data reduce model effectiveness in small datasets [44]
EAGLE Ensemble boosting Uses higher-level features to buffer against overfitting Custom genome-specific features need to be calculated for classification [51]
PreSTIGE Information theory Outputs different specificity thresholds Biased to cell type [52]
TEPIC Machine learning Feature space improves result interpretability Limited performance in gene-dense regions or with small sample sizes [45]
iOmicsPASS Network analysis Produces a sparse set of easily interpretable biological interactions. Effective in heterogeneous datasets Important markers that are poorly represented in biological networks can be lost in the analysis [53]
LemonTree Network analysis; Gibbs sampler; decision tree Modular model parts for different cases Trained on cancer data [46]
PANDA Network analysis; message passing Accounts for lack of direct regulatory element interaction Choice of convergence parameter affects results. Results may be difficult to interpret [54]
PARADIGM Network analysis; Probabilistic Graph Model Robust to false-positive results Training was performed on microarray data. Effectiveness in sequencing data unknown. Trained on cancer data [48]
IM-PET Random forest classifier Expected to generalize to other species Requires assembly of 4 manually derived scores [55]
JEME Random forest classifier; regression Easily retrainable on different systems if sufficient data are available At least 4 input data types are required [56]
RIPPLE Random forest classifier; regression Generalizable to other biological conditions and cell types Assumes balanced data categories [57]
SVM-MAP Support Vector Machine Expected to generalize to multiple cancer types Limited enhancer coverage in training data [58]
ELMER Wilcoxon rank-sum test Identifies upstream master regulators Restricted to methylation arrays in cancer [47]
TENET Wilcoxon rank-sum test Expected to generalize to other biological systems Targets group expression differences only [59]
RegNetDriver Wilcoxon rank-sum test Provides a framework to construct tissue-specific regulatory networks Requires assembly of multiple manually derived scores from system-specific steps [60]

Names, strategies, advantages, and limitations of each method is provided. Regarding advantages and limitations, a few major points were highlighted, and it is important to note that many of these methods are highly nuanced. A citation for reference to the original manuscript of each method is provided where full details can be obtained.