Skip to main content
. Author manuscript; available in PMC: 2023 May 1.
Published in final edited form as: Wiley Interdiscip Rev Comput Stat. 2021 Feb 7;14(3):e1553. doi: 10.1002/wics.1553

TABLE 1.

Summary of integrative multi-omics clustering methods under three categories and different approaches

Category Approach Method Description Strength Weakness Implementation
Concatenated clustering Joint latent model iCluster (iClusterPlus, iClusterBayes) Assume all omics data originate from a low dimensional latent matrix which can be used for the final clustering with probabilistic model Feature selection Computationally intensive R
moCluster Use the sparse consensus principal component to define a set of latent variables to get the final clustering Efficient with convergence to a deterministic solution Delicate normalization procedure required R
Low-rank approximation LRAcluster Assume different omics data are independent conditional on the stacked parameter matrix with low-rank constraints Convex objective function leading to a global solution No feature selection R
JIVE Decompose each data into three parts: low-rank approximation for joint variation, low-rank individual variation, and residual noise Account for individual data variation; feature selection Only applicable to continuous data; not robust to outliers Matlab, R
Non-negative matrix factorization jNMF (iNMF, intNMF) Approximate each omics data by a product of two non-negative matrices and minimize the approximation error Feature selection Local optimal solution only Matlab, Python, R
K-means related IS-K means Extend sparse K-means for multi-omics data through normalization and incorporate prior knowledge to select biologically meaningful features Can incorporate prior knowledge Only applicable to continuous data; delicate normalization procedure required R
Graph-based PARADIGM Develop a probabilistic graphical model and construct an integrated pathway activity matrix for features which can be used for clustering Can incorporate prior knowledge Pathway knowledge required; need submit data into the designated website to run the analysis Web/API
Clustering of clusters Perturbation-aided COCA Implement consensus clustering approach (generate perturbed datasets through resampling) Direct apply on different omics data without the need of normalization No feature selection NA
PINS (PINSPlus) Generate perturbed datasets by adding Gaussian noise to the original data and choose the optimal number of clusters through perturbation Robust to data with noise No feature selection R
Similarity-based Spectrum Construct sample-wise similarity matrix for each omics data using its proposed kernel first and then combine them to construct a Laplacian matrix followed by spectral clustering to get the final clustering Robust to data with noise; computational efficient No feature selection R
SNF (ab-SNF, NEMO) Construct sample-wise similarity matrix for each omics data first and then fuse them together followed by the final clustering Computational efficient; can deal with mixed type of data No feature selection R, Matlab
CIMLR Multiple kernel learning method that learns the similarity matrix that best fits the data through an optimization procedure constructed by a set of Gaussian kernels Feature selection Gaussian kernels only R, Matlab
rMKL-LPP Multiple kernel learning method that simultaneously optimizes kernel weight and projects data into a lower dimensional space Flexibility of incorporating multiple different kernels No feature selection Upon request
Interactive clustering Dirichlet mixture model-based MDI Use Dirichlet-multinomial mixture model with data dependence captured by parameters at the allocation level Can deal with mixed type of data; no requirement for a consistent clustering structure Computational intense with many parameters to specify Matlab
BCC Use Dirichlet mixture model to simultaneously identify the dependence and heterogeneity across multi-omics data Allow heterogeneity of multi-omics data when identify the overall clustering No feature selection; a consistent clustering structure required R
PSDF Use two-level hierarchy of Dirichlet process mixture model to separate concordant samples with feature selection Feature selection; No requirement for a consistent clustering structure Only integrate two omics data; Discretization of input data required Matlab

Notes: Methods in the parenthesis are extended methods based on the original method in front of the parentheses. NA, not available.