Skip to main content
. 2020 Jun 16;9(6):giaa064. doi: 10.1093/gigascience/giaa064

Table 2:

Inter-modality data harmonization approaches with a free modality scope

Method name Strategy Main advantages Main limitations Citation
DeepMF Deep learning and non-negative matrix factorization Robust to noise and missing data Manual parameter tuning and prior information may be required [70]
JIVE Dimensionality reduction Identifies the global modes of variation that drive associations across and within data types Not robust to outliers, missing values, or class imbalance [71]
GCCA Generalized canonical correlation analysis Identifies blocks of variables within datasets for correlation across datasets Less effective if the number of observations is smaller than the number of variables or if multiple linear correlations are present between datasets. Biases towards strong variation in the data [72]
NetICS Graph diffusion Robust to frequency of aberrant genes in sample Can only examine effects of known genes present in a defined interaction network [73]
DIABLO Multivariate model and latent variable model Captures quantitative information. Visual outputs aid interpretation Assumes a linear relationship between the selected omics features. Parameter tuning is required [20]
iCluster Latent variable model Captures both concordant and unique alterations across data types Sensitive to initial subset selection. Trained only on array data [62]
GFA Latent variable model Accepts data with missing values Manual parameter tuning. Prior information may be required [63]
MOFA Latent variable model and probabilistic Bayesian Leverages multiomics to impute missing values. Single-cell version available Assumes a linear relationship between the selected omics features. Manual parameter tuning required [74]
seurat Mutual nearest neighbours Effective in intra-modality as well as inter-modality integration. Robust to parameter changes Restricted to single cell. Requires robust reference data [69]
SNF Network analysis Effective in small heterogeneous samples. Captures quantitative information Does not yield quantitative data. Trained only on array data [75]
NMF Non-negative matrix factorization Accounts for complex modular structures in multimodal data Trained only on array data [65]
iNMF Non-negative matrix factorization Stable even in heterogeneous conditions Trained only on array data [66]
LIGER Non-negative matrix factorization Effective in intra-modality as well as inter-modality integration; effective in highly divergent datasets Restricted to single cell [67]
sMBPLS Sparse multi-block partial least-squares regression Derives weights for modalities indicating contributions to expression Performance is reduced with lower data dimensions [76]

Note that seurat and LIGER are specific to single-cell data and the others are intended for bulk data. Names, strategies, advantages, and limitations of each method are provided. Regarding advantages and limitations, a few major points are highlighted. A citation for reference to the original publication of each method is provided where full details can be obtained.