. Author manuscript; available in PMC: 2023 May 1.

Published in final edited form as: Wiley Interdiscip Rev Comput Stat. 2021 Feb 7;14(3):e1553. doi: 10.1002/wics.1553

TABLE 1.

Summary of integrative multi-omics clustering methods under three categories and different approaches

Category	Approach	Method	Description	Strength	Weakness	Implementation
Concatenated clustering	Joint latent model	iCluster (iClusterPlus, iClusterBayes)	Assume all omics data originate from a low dimensional latent matrix which can be used for the final clustering with probabilistic model	Feature selection	Computationally intensive	R
		moCluster	Use the sparse consensus principal component to define a set of latent variables to get the final clustering	Efficient with convergence to a deterministic solution	Delicate normalization procedure required	R
	Low-rank approximation	LRAcluster	Assume different omics data are independent conditional on the stacked parameter matrix with low-rank constraints	Convex objective function leading to a global solution	No feature selection	R
		JIVE	Decompose each data into three parts: low-rank approximation for joint variation, low-rank individual variation, and residual noise	Account for individual data variation; feature selection	Only applicable to continuous data; not robust to outliers	Matlab, R
	Non-negative matrix factorization	jNMF (iNMF, intNMF)	Approximate each omics data by a product of two non-negative matrices and minimize the approximation error	Feature selection	Local optimal solution only	Matlab, Python, R
	K-means related	IS-K means	Extend sparse K-means for multi-omics data through normalization and incorporate prior knowledge to select biologically meaningful features	Can incorporate prior knowledge	Only applicable to continuous data; delicate normalization procedure required	R
	Graph-based	PARADIGM	Develop a probabilistic graphical model and construct an integrated pathway activity matrix for features which can be used for clustering	Can incorporate prior knowledge	Pathway knowledge required; need submit data into the designated website to run the analysis	Web/API
Clustering of clusters	Perturbation-aided	COCA	Implement consensus clustering approach (generate perturbed datasets through resampling)	Direct apply on different omics data without the need of normalization	No feature selection	NA
		PINS (PINSPlus)	Generate perturbed datasets by adding Gaussian noise to the original data and choose the optimal number of clusters through perturbation	Robust to data with noise	No feature selection	R
	Similarity-based	Spectrum	Construct sample-wise similarity matrix for each omics data using its proposed kernel first and then combine them to construct a Laplacian matrix followed by spectral clustering to get the final clustering	Robust to data with noise; computational efficient	No feature selection	R
		SNF (ab-SNF, NEMO)	Construct sample-wise similarity matrix for each omics data first and then fuse them together followed by the final clustering	Computational efficient; can deal with mixed type of data	No feature selection	R, Matlab
		CIMLR	Multiple kernel learning method that learns the similarity matrix that best fits the data through an optimization procedure constructed by a set of Gaussian kernels	Feature selection	Gaussian kernels only	R, Matlab
		rMKL-LPP	Multiple kernel learning method that simultaneously optimizes kernel weight and projects data into a lower dimensional space	Flexibility of incorporating multiple different kernels	No feature selection	Upon request
Interactive clustering	Dirichlet mixture model-based	MDI	Use Dirichlet-multinomial mixture model with data dependence captured by parameters at the allocation level	Can deal with mixed type of data; no requirement for a consistent clustering structure	Computational intense with many parameters to specify	Matlab
		BCC	Use Dirichlet mixture model to simultaneously identify the dependence and heterogeneity across multi-omics data	Allow heterogeneity of multi-omics data when identify the overall clustering	No feature selection; a consistent clustering structure required	R
		PSDF	Use two-level hierarchy of Dirichlet process mixture model to separate concordant samples with feature selection	Feature selection; No requirement for a consistent clustering structure	Only integrate two omics data; Discretization of input data required	Matlab

Notes: Methods in the parenthesis are extended methods based on the original method in front of the parentheses. NA, not available.