TABLE 1.
Category | Approach | Method | Description | Strength | Weakness | Implementation |
---|---|---|---|---|---|---|
Concatenated clustering | Joint latent model | iCluster (iClusterPlus, iClusterBayes) | Assume all omics data originate from a low dimensional latent matrix which can be used for the final clustering with probabilistic model | Feature selection | Computationally intensive | R |
moCluster | Use the sparse consensus principal component to define a set of latent variables to get the final clustering | Efficient with convergence to a deterministic solution | Delicate normalization procedure required | R | ||
Low-rank approximation | LRAcluster | Assume different omics data are independent conditional on the stacked parameter matrix with low-rank constraints | Convex objective function leading to a global solution | No feature selection | R | |
JIVE | Decompose each data into three parts: low-rank approximation for joint variation, low-rank individual variation, and residual noise | Account for individual data variation; feature selection | Only applicable to continuous data; not robust to outliers | Matlab, R | ||
Non-negative matrix factorization | jNMF (iNMF, intNMF) | Approximate each omics data by a product of two non-negative matrices and minimize the approximation error | Feature selection | Local optimal solution only | Matlab, Python, R | |
K-means related | IS-K means | Extend sparse K-means for multi-omics data through normalization and incorporate prior knowledge to select biologically meaningful features | Can incorporate prior knowledge | Only applicable to continuous data; delicate normalization procedure required | R | |
Graph-based | PARADIGM | Develop a probabilistic graphical model and construct an integrated pathway activity matrix for features which can be used for clustering | Can incorporate prior knowledge | Pathway knowledge required; need submit data into the designated website to run the analysis | Web/API | |
Clustering of clusters | Perturbation-aided | COCA | Implement consensus clustering approach (generate perturbed datasets through resampling) | Direct apply on different omics data without the need of normalization | No feature selection | NA |
PINS (PINSPlus) | Generate perturbed datasets by adding Gaussian noise to the original data and choose the optimal number of clusters through perturbation | Robust to data with noise | No feature selection | R | ||
Similarity-based | Spectrum | Construct sample-wise similarity matrix for each omics data using its proposed kernel first and then combine them to construct a Laplacian matrix followed by spectral clustering to get the final clustering | Robust to data with noise; computational efficient | No feature selection | R | |
SNF (ab-SNF, NEMO) | Construct sample-wise similarity matrix for each omics data first and then fuse them together followed by the final clustering | Computational efficient; can deal with mixed type of data | No feature selection | R, Matlab | ||
CIMLR | Multiple kernel learning method that learns the similarity matrix that best fits the data through an optimization procedure constructed by a set of Gaussian kernels | Feature selection | Gaussian kernels only | R, Matlab | ||
rMKL-LPP | Multiple kernel learning method that simultaneously optimizes kernel weight and projects data into a lower dimensional space | Flexibility of incorporating multiple different kernels | No feature selection | Upon request | ||
Interactive clustering | Dirichlet mixture model-based | MDI | Use Dirichlet-multinomial mixture model with data dependence captured by parameters at the allocation level | Can deal with mixed type of data; no requirement for a consistent clustering structure | Computational intense with many parameters to specify | Matlab |
BCC | Use Dirichlet mixture model to simultaneously identify the dependence and heterogeneity across multi-omics data | Allow heterogeneity of multi-omics data when identify the overall clustering | No feature selection; a consistent clustering structure required | R | ||
PSDF | Use two-level hierarchy of Dirichlet process mixture model to separate concordant samples with feature selection | Feature selection; No requirement for a consistent clustering structure | Only integrate two omics data; Discretization of input data required | Matlab |
Notes: Methods in the parenthesis are extended methods based on the original method in front of the parentheses. NA, not available.