CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data

Chen Zhao; Anqi Liu; Xiao Zhang; Xuewei Cao; Zhengming Ding; Qiuying Sha; Hui Shen; Hong-Wen Deng; Weihua Zhou

doi:10.1016/j.compbiomed.2024.108058

. Author manuscript; available in PMC: 2024 Mar 22.

Published in final edited form as: Comput Biol Med. 2024 Jan 28;170:108058. doi: 10.1016/j.compbiomed.2024.108058

CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data

Chen Zhao ^a, Anqi Liu ^b, Xiao Zhang ^b, Xuewei Cao ^c, Zhengming Ding ^d, Qiuying Sha ^c, Hui Shen ^b, Hong-Wen Deng ^b,^**, Weihua Zhou ^e,^f,^*

PMCID: PMC10959569 NIHMSID: NIHMS1972582 PMID: 38295477

Abstract

Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding etiology of complex genetic diseases. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning is employed, which maximizes the mutual information between different types of omics. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Finally, a Softmax classifier is employed to perform multi-omics data classification. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicate that our proposed CLCLSA produces promising results in multi-omics data classification using both complete and incomplete multi-omics data.

Keywords: Multi-omics integration, Incomplete omics data, Deep learning, Autoencoders, Contrastive learning

1. Introduction

The development of high-throughput omics technologies has revolutionized our ability to study biological systems at a molecular level [1]. These high-throughput techniques, including genomics, transcriptomics, proteomics and epigenomics, allow us to profile the genetic expression and interaction of molecules from different biological perspectives [2]. However, each omics technique only provides a limited view of the underlying biological process. Integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes.

Due to the high cost or limited sensitivity of instruments, it is possible that one or more omics data types of a biological sample may be missing. The presence of partially missing individual-level observations poses a major challenge in multi-omics data integration [3,4]. A general solution to the multi-source data integration is to map the data from different omics layers into a common space and employing a single-omics data integration method for downstream analysis [5,6]. However, explicitly projecting the features from different omics types results in information loss due to the heterogeneity of multi-omics data [7] and further decreases the performance of downstream tasks, making it challenging yet important to effectively utilize incomplete omics data. Another general solution is to use the complete case which considers only the set of subjects with complete observations across all omics data [8]. This approach is convenient; however, it results in a decrease in sample size and the trained model only reflects the performance on the partial dataset. To fully use the entire dataset with incomplete multi-omics data, the multi-omics imputation based methods perform the data completion on raw input [9], which is challenging because of the high dimension of multi-omics data. Our study addresses the multi-omics imputation-based methods to fully use the entire dataset with incomplete multi-omics data for the classification tasks. This approach is vital for advancing the understanding of complex biological systems where data incompleteness is a common hurdle.

Numerous methods for integrating multi-omics data have been proposed, including neural network-based integration [10], machine learning-based integration [11], and pathway-based integration [12]. The neural network-based integration involves constructing networks for each omics data type and then integrating them to generate a more comprehensive network that captures the interactions between different omics layers, such as mRNA-miRNA interaction [13]. The machine learning-based integration uses machine learning models to integrate different types of omics data [2]. The pathway-based integration involves mapping omics data into known biological pathways to identify key pathways that are dysregulated in diseases [12]. Currently available integration methods for multi-omics data mostly focus on complete multi-omics data integration tasks while our method centers on imputing missing data to fully leverage the entire dataset for multi-omics classification tasks. By addressing the issue of missing data, our approach aims to enhance the robustness of multi-omics analyses, providing a valuable contribution to the field. Specifically, we employ supervised learning to guide the imputation module to generate a precise latent representation for missing data.

2. Contribution

In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA), which contains three key components, as shown in Fig. 1. The novelty and contribution of this study are as follows.

Fig. 1. — Architecture of the proposed Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). a) cross-omics feature representation learning. To perform multi-omics integration using incomplete multi-omics data, CLCLSA first performs the cross-omics feature representation prediction using existing omics data. Our cross-omics feature representation learning module learns the complete set of omics layers for downstream tasks. Each subject has at least one omics layer, and the solid rectangle with black borders indicate that the omics data is included while the solid rectangle without black borders means that the omics data is missing. b) self-attention-based omics-specific feature embedding and cross-omics feature integration. The embedded features from each omics are concatenated for multi-omics data classification. c) contrastive learning for consistency learning. In this example, three types of omics data, including mRNA expression, DNA methylation and miRNA expression, are included for cancer classification with incomplete omics layers.

Utilizing complete multi-omics data as supervision, the model uses cross-omics autoencoders to learn the cross-omics feature representation. With cross-omics embedding, CLCLSA can reconstruct the incomplete multi-omics data by calculating the modality-specific representation.
The multi-omics contrastive learning is used to maximize the mutual information between different omics layers as shown in Fig. 1 (c).
Feature-level self-attention and omics-level self-attention are employed to dynamically select the most informative features for multi-omics data integration. The cross-omics autoencoders are used to restore missing omics data and the contrastive learning and self-attention are used to boost model performance.
The study introduces advancements in imputing incomplete multi-omics data. Efficient imputation methods are crucial for maximizing the utility of existing datasets and can lead to more accurate and comprehensive biological analyses, thereby supporting novel discoveries in various biological fields.

Extensive experiments were performed on four public datasets for multi-omics data classification. Experimental results indicated that the CLCLSA achieved the state-of-the-art performance on these tasks using both complete and incomplete multi-omics data. Our code is available in https://github.com/MIILab-MTU/CLCLSA.

3. Related work

3.1. Multi-view integration

Multi-view learning (MVL) is a strategy for fusing data from different modalities. MVL approaches can be divided into three categories: 1) co-training, 2) multi-kernel learning, and 3) subspace learning [14]. Co-training exchanges the separately trained models and provides the pseudo labels with discriminative information between views to accelerate model training, which focuses more on semi-supervised learning tasks [15]. Multi-kernel learning aims at finding the mapping between different views with different kernels, and then combines the projected features for information fusion [16]. Subspace learning aims at finding the common space between different views where the shared latent space contains the information of all views [17]. For subspace learning, autoencoder is widely used for extracting the common representation of the multi-view data.

Machine learning and artificial intelligence have also shown promising results in multi-omics integration tasks. Xie et al. [18] performed early concatenation of raw features from different omics and employed a deep neural network for implicit feature embedding. Wang et al. [2] developed a deep learning based method for multi-omics data integration using graph convolutional network (GCN) for within omics feature integration and view correlation discovery network for cross-omics feature integration. Han et al. [19] developed a dynamic fusion method for trustworthy multi-omics data classification, which dynamically calculated the feature importance and modality importance for omics data integration. Although these methods achieved impressive performance for multi-omics data integration, they were designed for complete multi-omics data integration.

3.2. Incomplete multi-view representation learning

Missing data are common in generating multi-omics data. It can arise due to a variety of reasons, such as insufficient sample volume, instrument sensitivity, data collection cost and poor tissue quality [20]. Most of the incomplete multi-view integration methods adopt the two-stage training strategy, which constructs complete multi-view observations based on imputation methods [21,22] and performs multi-omics integration based on the existing multi-view data and the imputed data. Lee employed the variational information bottleneck approach to integrate incomplete multi-omics data for classification [23]. Multi-omics factor analysis (MOFA) [24] and MOFA + [25] are unsupervised integrative approaches that factorize each omics matrix into the products of two components and the shared component is used to generate latent representation for downstream tasks. Multimodal variational autoencoder was proposed as an unsupervised approach for integrative multi-view analysis [26]. Though these unsupervised learning methods are capable of learning latent representation, the task-relevant information may be lost because the labels of the biological samples are not used during the training. Compared to existing incomplete multi-view integration methods, our proposed CLCLSA not only imputes the missing omics using unsupervised learning, but also uses the supervised learning to guide the imputation module to generate a precise latent representation for missing data. In addition, our CLCLSA imputes the missing omics data on latent space rather than the input space, which reduces the dimensionality of the imputation task and improves the quality of the imputed data.

3.3. Contrastive learning

Contrastive learning is one of the most effective unsupervised learning methods in the field of representation learning. Contrastive learning aims at seeking a latent space that separates the pairs from different classes and maximizes the similarity for samples with the same classes [27]. Existing methods maximize the mutual information between the augmented data pairs [27,28] and adopt InfoNCE as the contrastive loss. However, our method uses the multi-view omics data as the compared pairs to maximize the consistency across different omics data.

4. Methodology

We assume that there are $N$ subjects with $M$ omics data to be integrated. Each subject has a distinct feature set $𝒳_{i} = {x_{i}^{(1)}, x_{i}^{(2)}, \dots, x_{i}^{(N)}}$ , where $x_{i}^{(j)} \in ℝ^{|V_{i}|}$ represents the omics features for $j$ -th subject of $i$ -th omics; $|V_{i}|$ is the feature dimension of $i$ -th omics. To avoid the ambiguity between the index of the omics layers and subjects, we use $i \in {1, \dots, M}$ in the subscript to represent the index of omics layer and $j \in {1, \dots, N}$ in the superscript to represent the index of the subject. The overview of our proposed CLCLSA is shown in Fig. 1.

4.1. Self-attention based dynamical integration

Feature-level self-attention.

We employ the self-attention component for feature-level feature selection and omics-level feature selection. Though sparsity of the high-dimensional data enables a high-level feature embedding, the informativeness of different features among different subjects are more important for multi-omics integration [29]. The calculated feature-level attention scores are denoted as

f a t t_{i}^{(j)} = σ (f_{i} (x_{i}^{(j)}))

(1)

where $f_{i}$ is the feature-level self-attention encoder implemented by multi-layer perceptron (MLP) and $σ$ is the sigmoid activation function. The features are further embedded by omics-specific encoder $e m b_{i}$ and the calculated feature-level attention scores, which is denoted as

{\hat{x}}_{i}^{(j)} = e m b_{i} (x_{i}^{(j)} ⊙ f a t t_{i}^{(j)})

(2)

where $⊙$ represents the element-wise multiplication.

Omics-level self-attention.

The importance of different modalities is not fixed during the multi-omics fusion [30]. We further employ the omics-level self-attention encoder $g_{i}$ to calculate the omics-level importance, which is denoted as

{m a t t}_{i}^{(j)} = σ (g_{i} ({\hat{x}}_{i}^{(j)}))

(3)

where $g_{i}$ is implemented using MLP. The latent features ${\hat{x}}_{i}^{(j)}$ is further embedded by omics-level attention scores, as shown in Eq. (4).

{\hat{z}}_{i}^{(j)} = {\hat{x}}_{i}^{(j)} ⊙ {m a t t}_{i}^{(j)}

(4)

Multi-view fusion.

A nested multi-omics fusion is employed in this study which concatenates the features performed by feature-level self-attention and omics-level self-attention, as shown in Eq. (5).

z^{(j)} = [{\hat{z}}_{1}^{(j)}, \dots, {\hat{z}}_{M}^{(j)}]

(5)

where $[\cdot]$ indicates the concatenation operator and $z$ is the multi-omics representation. It integrates diverse omics data while preserving the relationships learned in the low-dimensional space, facilitating downstream tasks [31].

4.2. Missing omics completion

The proposed CLCLSA is a deep generative model that aligns feature representation from different omics after building a partially paired multi-omics data. The key component of CLCLSA for missing omics completion is that it models the shared latent feature representation as a combination of modality specific sub-spaces of all existing modalities. Instead of completing missing modality in the raw feature space, the cross-omics representation learning is employed to complete the missing omics in the latent feature space. Suppose the latent feature representation for $i$ -th omics is ${\hat{z}}_{i}^{(j)} \in ℝ^{D_{i}}$ , where $D_{i}$ is the dimension of the latent space. In practice, we set $D_{i}$ , s.t. $i \in {1, \dots, M}$ has the same dimensions. Suppose the indices of missing omics and the known omics for $j$ -th subject are $i$ and $k$ , we employ the autoencoder to perform cross-omics representation learning, which is defined in Eq. (6).

h_{i k} ({\hat{z}}_{k}^{(j)}) = {d e c}_{i} ({e n c}_{k} ({\hat{z}}_{k}^{(j)}))

(6)

where ${e n c}_{k} (\cdot)$ indicates the encoder for the $k$ -th omics layer and ${d e c}_{i} (\cdot)$ indicates the decoder for the $i$ -th omics layer; $h_{i k} ({\hat{z}}_{k}^{(j)})$ is the predicted latent embedding of the $j$ -th subject for the $i$ -th omics data using the $k$ -th omics data. The mean squared errors between the predicted latent representation from other omics, i.e. $h_{i k} ({\hat{z}}_{k}^{(j)})$ , and the extracted feature embedding from the $i$ -th omics, i.e. ${\hat{z}}_{i}^{(j)}$ , is used to optimize the weights for the autoencoder $h_{i k}$ , as shown in Eq. (7).

L_{c o}^{i k} (j) = {∥h_{i k} ({\hat{z}}_{k}^{(j)}) - {\hat{z}}_{i}^{(j)}∥}_{2}^{2}

(7)

The use of the square of the L2-norm in defining the loss function refers to the Mean Squared Error (MSE), which is a common choice in various machine learning tasks, including regression problems. In addition, squaring is a computationally efficient operation, and using the square of the L2-norm simplifies the computational cost of the loss calculation during training. The term ${∥h_{i k} ({\hat{z}}_{k}^{(j)}) - {\hat{z}}_{i}^{(j)}∥}_{2}^{2}$ demonstrates the MSE between the predicted latent embedding of the $j$ -th subject for the $i$ -th omics data using the $k$ -th omics data features and the extracted feature embedding from the $i$ -th omics. Employing this loss function compels the cross-omics feature embedding model to acquire the ability to impute feature embeddings from one omics to another.

Under the bi-view setting, the cross-omics data reconstruction loss is defined as

L_{c o} = \sum_{j = 1}^{N} (L_{c o}^{i k} (j) + L_{c o}^{k i} (j)) = \sum_{j = 1}^{N} {∥h_{12} ({\hat{z}}_{2}^{(j)}) - {\hat{z}}_{1}^{(j)}∥}_{2}^{2} + {∥h_{21} ({\hat{z}}_{1}^{(j)}) - {\hat{z}}_{2}^{(j)}∥}_{2}^{2}

(8)

Under the multi-view setting, the cross-omics data reconstruction loss is defined in a fully connected fashion in Eq. (9).

L_{c o} = \sum_{j}^{N} \sum_{i}^{M} \sum_{k}^{M} 𝕀_{{i \neq k}} L_{c o}^{i k} (j) = \sum_{j = 1}^{N} \sum_{i = 1}^{M} \sum_{k = 1}^{M} 𝕀_{{i \neq k}} {∥h_{i k} ({\hat{z}}_{k}^{(j)}) - {\hat{z}}_{i}^{(j)}∥}_{2}^{2}

(9)

where $𝕀_{{i \neq k}}$ is an indicator function. The indicator in Eq. (9) signifies that when $i \neq k$ , the cross-omics loss defined in Eq. (7) represents the cross-omics data reconstruction loss, where $i$ and $k$ are the indices of different omics data. Conversely, when $i = k$ , it indicates that the model predicts the $i$ -th omics feature representation from the same omics data. However, this form of feature imputation is unnecessary as it contributes zero information to cross-omics feature reconstruction. Therefore, the indicator function ensures that only cross-omics feature reconstruction losses are considered for model optimization.

However, optimizing cross-omics loss of Eq. (8) or (9) leads to trivial solutions that ${{\hat{z}}_{1}^{(j)}, {\hat{z}}_{2}^{(j)}, \dots, {\hat{z}}_{M}^{(j)}}$ converge to the same solution [32,33]. To overcome this issue, we train the CLCLSA using both subjects with complete omics data and subjects with missing omics data. For the subjects with complete omics data, we feed them into the network to generate the feature representation for all views and train the autoencoders $h_{i k}, i \neq k$ and $i, k \in {1 \dots M}$ between each of the two omics layers. For the subjects with incomplete omics, we feed them into the network to generate the feature representation for the views with omics data and use the trained $h_{i k}$ to generate the feature representation to perform the missing omics completion. Using both the complete representation, i.e., ${\hat{z}}_{i}^{(j)}$ , and the predicted representation, i.e., $h_{i k} ({\hat{z}}_{k}^{(j)})$ , we fuse them as the multi-omics data representation for downstream tasks. The detailed architecture of the autoencoder is illustrated in section 4.1.

4.3. Contrastive learning

Contrastive learning is a widely used technique in deep learning for incomplete multi-modality fusion, which guarantees the consistency between different modalities [32]. Most of the contrastive learning algorithms maximize the lower bound of the mutual information between the augmented samples and the raw training samples [28,34] using single-view data. However, we maximize the lower bound using the mutual information between different omics layers. We adopt the contrastive learning loss function proposed in Ref. [32] to maximize the entropy of the feature representation for each omics layer as well as the mutual information between different omics layers. The cross-view contrastive learning loss is defined as:

L_{c l} (Z_{i}, Z_{k}) = - \sum_{d = 1}^{D_{i}} \sum_{d = 1}^{D_{k}} P_{d d^{'}} l n \frac{P_{d d^{'}}}{P_{d}^{α + 1} \cdot P_{d^{'}}^{α + 1}}

(10)

where $Z_{i} = {z_{i}^{(1)}, z_{i}^{(2)}, \dots, z_{i}^{N}} \in ℝ^{N \times D_{k}}$ is the latent representation matrix for $N$ subjects on $i$ -th omics, $P \in ℝ^{D_{i} \times D_{k}}$ represents the joint probability distribution matrix of $Z_{i}$ and $Z_{k}$ , as defined in Eq. (11).

P = \frac{1}{N} \sum_{j = 1}^{N} {\hat{z}}_{i}^{(j)} \cdot {({\hat{z}}_{k}^{(j)})}^{T}

(11)

where $P_{d d^{'}}$ is the $(d, d^{'})$ -th element of $P; P_{d}$ and $P_{d^{'}}$ indicate the marginal probability distribution; and $α$ is the hyperparameter balancing the cross-omics mutual information and omics-specific entropy.

For the multi-omics contrastive learning, we average the contrastive loss between each of two omics layers, as shown in Eq. (12).

L_{c l} = \sum_{i = 1}^{M} \sum_{k = 1}^{M} 𝕀_{{i \neq k}} L_{c l} (Z_{i}, Z_{k})

(12)

where $𝕀_{{i \neq k}}$ is an indicator function.

4.4. Loss function and training strategy

Given an incomplete multi-omics dataset ${\{𝒳_{i}\}}_{i = 1}^{M}$ and the classification gold standards ${y^{(j)}}_{j = 1}^{N}$ , we first feed the subjects with complete multi-omics data into the CLCLSA. For the subjects with missing omics, CLCLSA completes the latent feature representation of the missing omics layer(s) by cross-omics autoencoders defined in Section 3.2. The concatenated features are used for classification and the classifier is denoted as $c$ , as shown in Fig. 1 (b). In addition, for each omics layer, an auxiliary classifier is employed to boost model training, which is denoted as $c_{i}$ for the $i$ -th omics layer, as shown in Fig. 1 (b).

During the model training, we set the cross-entropy loss as the loss for the classifier $c$ , as shown in Eq. (13).

L_{c l f} = - \sum_{j = 1}^{N} y^{(j)} l o g {\hat{y}}^{(j)}

(13)

where ${\hat{y}}^{(j)} = c (z^{(j)})$ is the classifier output.

Since both of the omics attention score and omics auxiliary classifier probability are the maximal Softmax outputs, they promisingly reflect the classification confidence [19]. We adopt the auxiliary classification loss for each omics as a regularity for both omics-specific attention encoder $g_{i}$ and omics-specific auxiliary classifier $c_{i}$ , as defined in Eq. (14).

L_{a l} = \sum_{i}^{M} {({m a t t}_{i}^{(j)} - {\hat{y}}_{i}^{(j)})}^{2} + y^{(j)} l o g {\hat{y}}_{i}^{(j)}

(14)

where ${\hat{y}}^{(j)} = c_{i} (z_{i}^{(j)})$ is the auxiliary classifier output.

The overall loss function for CLCLSA is defined in Eq. (15).

L = L_{c l f} + λ_{a l} L_{a l} + λ_{c o} L_{c o} + λ_{c l} L_{c l}

(15)

where $λ_{a l}, λ_{c o}$ and $λ_{c l}$ are three hyperparameters balancing the weights of the auxiliary classification loss, cross-omics data reconstruction loss and multi-omics contrastive loss.

5. Experiments and discussion

To validate the proposed CLCLSA model, we applied the proposed model on four empirical multi-omics datasets. Extensive experimental results indicated the superiority of the proposed method on multi-omics data classification tasks using both complete and incomplete multi-omics data. In addition, we also analyzed the hyperparameter settings and performed ablation studies to demonstrate the effectiveness of different components.

5.1. Datasets and comparison approaches

We conduct experiments on the following four widely used multi-omics datasets. For each subject in these four datasets, the mRNA expression, DNA methylation and miRNA expression data were used. We adopt the same data preprocessing pipeline as [2], and set the same feature dimension for each omics. To fairly compare the proposed CLCLSA with existing approaches, we follow the same protocol of [2,19] to build the training set and testing set for evaluation. The detailed subject numbers among each category is shown in Table S3.

ROSMAP dataset [35,36], which contains 351 subjects, is used for distinguishing Alzheimer’s Disease (AD) subjects from normal controls.
LGG dataset [37] is for low-grade glioma classification, which contains 246 grade-2 subjects and 264 grade-3 subjects.
BRCA dataset [38] is for breast invasive carcinoma PAM50 subtype classification, which contains 876 subjects among 5 classes.
KIPAN dataset [38] is used for kidney cancer type classification, which contains 658 subjects from 3 classes.

For the performance evaluations using complete multi-omics datasets, we compared the proposed method with the following 13 existing classification algorithms:

Five machine learning algorithms. K-nearest neighbor classifier (KNN) [39], Support Vector Machine (SVM) [40], Linear Regression with L1 regularization (LR), Random Forest (RF) [41] and fully connected neural networks (NN).
Adaptive group-regularized ridge regression (GRridge) [42]. GRridge is a method for adaptive group-regularized ridge regression with group-specific penalties for high-dimensional data classification.
Block partial least squares discriminant analysis (BPLSDA) and Block sparse partial least squares discriminant analysis (BSPLSDA) [43]. BPLSDA extends sparse generalized canonical correlation analysis to a classification framework, and BSPLSDA adds sparse constraints to BPLSDA.
Multi-Omics Graph cOnvolutional NETworks (MOGONET) [2]: MOGONET jointly explores omics-specific learning using graph convolution network and cross-omics correlation learning using view correlation discovery network for multi-omics integration and classification.
Trusted multi-view classification (TMC) [44]. TMC dynamically computes the trustworthiness of each modality for different subjects with reliable integration for multi-view classification.
Concatenation of final multimodal representations (CF) [45]. CF performs multi-view information fusion based on late fusion and compactness-based fusion.
Gated multimodal units for information fusion (GMU) [46]. GMU employs the gates for selecting the most important parts of the input of each modality to correctly generate the desired output.
Multi-modality dynamic fusion (MMDynamic) [19]. MMDynamic employs the feature-level and modality-specific gates for multi-modality data fusion.
Uncertainty-induced Incomplete Multi-View Data Classification (UIMC) [47]: UIMC models uncertainty in the missing view to generate the complete instances by sampling from the distribution. Then, the evidential classification is applied to leverage the uncertainty of each view, ensuring trustworthy exploitation of imputed data by the model.
Multi-Level Confidence Learning for Trustworthy Multimodal Classification (MLCLNet) [48]: MLCLNet employs a feature confidence learning mechanism to suppress redundant features and a graph convolution network to learn the corresponding structure for multi-view information fusion.

For the performance evaluation using incomplete multi-omics datasets, we compared the proposed method with the following 7 existing multi-modality fusion algorithms:

Kernel generalized CCA (KGCCA) [49]. KGCCA extends KCCA with a prior-defined graph between different modalities.
Sparse CCA (SCCA) [50]. SCCA extends CCA with modality-specific sparse penalty.
Multi-view Variation AutoEncoder (MVAE) [26,51]. MVAE extends variational autoencoders for latent feature extraction and employs the product of the multivariate Gaussian distributions for information fusion. For samples with missing modalities, the trained MVAE generates latent representation by the product of the latent features from the existing modalities.
Cross Partial Multi-View Networks (CPM) [52]. CPM performs feature embedding using incomplete multi-view data by focusing on the completeness and versatility of the feature embedding.
Dual Contrastive Prediction (DCP) [32]. DCP employs maximizes the conditional entropy through dual prediction to recover the missing views and employs dual contrastive loss to learn the consistent representation among different modalities.
Latent Heterogeneous Graph Network (LHGN) [53]: LHGN employs view-specific encoders to learn a common latent representation of different views and a latent heterogeneous graph construction layer with multi-head attention to aggregate features for classification.

For the above baseline models using the complete multi-omics datasets, we compared the CLCLSA with the performance reported in Refs. [2,19]. For the models using the incomplete multi-omics datasets, we also performed the grid search to find the optimal hyperparameters and network settings and compared the performance from the best models. To evaluate the CLCLSA performance in handling incomplete data, we manually created data with different missing rates. The missing rate is denoted as $η = N_{m} / N$ , where $N_{m}$ represents the number of subjects with at most $M - 1$ omics layers.

Evaluations.

To evaluate the performance between the proposed CLCLSA and comparison approaches for binary classification on ROSMAP and LGG datasets, we employed accuracy (ACC), F1-score (F1) and area under the receiver operating characteristic curve (AUC) to evaluate the performance. To evaluate the multi-class classification performance, we employed the ACC, weighted F1 score (WeightedF1) and macro-averaged F1 score (MacroF1).

5.2. Network architecture and implementation details

The proposed CLCLSA contains 6 components, i.e., the feature-level self-attention encoder $f_{i}$ , the omics-specific encoder $e m b_{i}$ , the omics-level self-attention encoder $g_{i}$ , the omics-specific auxiliary classifier $c_{i}$ , the final classifier $c$ , and the omics-to-omics conditional sparse autoencoder $h_{i k}$ . The $f_{i}$ and $g_{i}$ were implemented using MLPs. The $e m b_{i}$ was implemented by MLPs followed by ReLU activation functions and Dropout layer (DP) with the dropout probability of 0.5. The $h_{i k}$ was implemented by MLPs followed by batch normalization (BN) layer [54] and ReLU activation functions. The $c_{i}$ and $c$ were implemented by MLPs and Softmax classifiers. The feature dimensions for these four multi-omics datasets, and the settings of $f_{i}, e m b_{i}, g_{i}, h_{i k}, c_{i}$ and $c$ are shown in Table 1.

Table 1.

Feature dimensions of the four multi-omics datasets and the network settings of our CLCLSA. The feature dimensions represent the dimension of mRNA expression, DNA methylation and miRNA expression sequentially. The network settings are for the mRNA expression, DNA methylation and miRNA omics layers, respectively.

Dataset	ROSMAP	LGG	BRCA	KIPAN
Categories	2	2	5	5
Feature dimensions	200, 200, 200	2000, 2000, 548	1000, 1000, 503	2000, 2000, 445
$f_{i}$	200-200, 200-200, 200–200	2000–2000, 2000–2000, 548–548	1000-1000, 1000-1000, 503–503	2000–2000, 2000–2000, 445–445
$e m b_{i}$	200-300-ReLU-DP, 200-300-ReLU-DP, 200-300-ReLU-DP	2000-200-ReLU-DP, 2000-200-ReLU-DP, 548-200-ReLU-DP	1000-200-ReLU-DP, 1000-200-ReLU-DP, 503-200-ReLU-DP	2000-200-ReLU-DP, 2000-200-ReLU-DP, 445-200-ReLU-DP
$g_{i}$	300-1, 300-1, 300–1	200-1, 200-1, 200–1	200-1, 200-1, 200–1	200-1, 200-1, 200–1
$h_{i k}$	300-64-BN-ReLU-32-ReLU-64-BN-ReLU-300	200-64-BN-ReLU-32-ReLU-64-BN-ReLU-200	200-64-BN-ReLU-32-ReLU-64-BN-ReLU-200	200-64-BN-ReLU-32-ReLU-64-BN-ReLU-200
$c_{i}$	300-2-Softmax, 300-2-Softmax, 300-2-Softmax	200-2-Softmax, 200-2-Softmax, 200-2-Softmax	200-5-Softmax, 200-5-Softmax, 200-5-Softmax	200-3-Softmax, 200-3-Softmax, 200-3-Softmax
$c$	900-2-Softmax	600-2-Softmax	600-5-Softmax	600-5-Softmax

Open in a new tab

Grid search.

To investigate the effectiveness of different components of the proposed CLCLSA, we performed the hyperparameter finetuning by grid search. In detail, we trained the CLCLSA on four datasets using different $λ_{a l}, λ_{c o}$ and $λ_{c l}$ , where each hyperparameter was set as one of {0,0.01,0.02,0.05,0.1,1.0}, ‘0’ indicated the corresponding component was removed during the model training. Note that the $λ_{c o}$ was set as 0 when training with the complete multi-omics data and $λ_{c o}$ was set greater than 0 when training with the incomplete multi-omics data. For the models trained on different datasets and different missing rates, the optimal hyperparameters varied.

We implemented the CLCLSA using PyTorch 1.10 [55], and the experiments were carried on a workstation with an NVIDIA RTX A6000 GPU. Adam optimizer [56] with the initial learning rate of 0.0001 and the learning rate decay technique was employed to optimize the model weights. The maximum training epochs were set fixed as 2500 for all experiments. The batch size was set as 245, 357, 612 and 460 for ROSMAP, LGG, BRCA and KIPAN datasets. The training times were 80s, 75s, 100s and 85s for ROSMAP, LGG, BRCA and KIPAN datasets, respectively. With the well trained model, the prediction times for one case were 0.009s, 0.021s, 0.008s and 0.023s, respectively.

5.3. Performance using complete multi-omics data

We compared our CLCLSA model with existing multi-view classification algorithms, as shown in Table 2. When using the complete multi-omics data, the cross-omics autoencoders were removed and the $L_{c o}$ was set as zero.

Table 2.

Comparison with the state-of-the-art algorithms for multi-omics data classification. The bold texts indicate the best performance.

Dataset	ROSMAP			LGG
	ACC	F1	AUC	ACC	F1	AUC

KNN	65.7 ± 3.6	67.1 ± 4.5	70.9 ± 4.5	72.9 ± 3.4	73.8 ± 3.8	79.9 ± 3.8
SVM	77.0 ± 2.4	77.8 ± 2.6	77.0 ± 2.6	75.4 ± 4.6	75.7 ± 4.6	75.4 ± 4.6
LR	69.4 ± 3.7	73.0 ± 3.5	77.0 ± 3.5	76.1 ± 1.8	76.7 ± 2.7	82.3 ± 2.7
RF	72.6 ± 2.9	73.4 ± 1.9	81.1 ± 1.9	74.8 ± 1.2	74.2 ± 1.0	82.3 ± 1.0
NN	75.5 ± 2.1	76.4 ± 2.5	82.7 ± 2.5	73.7 ± 2.3	74.8 ± 3.7	81.0 ± 3.7
GRridge	76.0 ± 3.4	76.9 ± 2.3	84.1 ± 2.3	74.6 ± 3.8	75.6 ± 4.4	82.6 ± 4.4
BPLSDA	74.2 ± 2.4	75.5 ± 2.5	83.0 ± 2.5	75.9 ± 2.5	73.8 ± 2.3	82.5 ± 2.3
BSPLSDA	75.3 ± 3.3	76.4 ± 2.1	83.8 ± 2.1	68.5 ± 2.7	66.2 ± 2.6	73.0 ± 2.6
MOGONET	81.5 ± 2.3	82.1 ± 1.2	87.4 ± 1.2	81.6 ± 1.6	81.4 ± 2.7	84.0 ± 2.7
TMC	82.5 ± 0.9	82.3 ± 0.6	88.5 ± 0.6	81.9 ± 0.8	81.5 ± 0.4	87.1 ± 0.4
CF	78.4 ± 1.1	78.8 ± 0.5	88.0 ± 0.5	81.1 ± 1.2	82.2 ± 0.4	88.1 ± 0.4
GMU	77.6 ± 2.5	78.4 ± 1.6	86.9 ± 1.6	80.3 ± 1.5	80.8 ± 1.2	88.6 ± 1.2
MMDynamics	84.2 ± 1.3	84.6 ± 0.7	91.2 ± 0.7	83.3 ± 1.0	83.7 ± 0.4	88.5 ± 0.4
UIMC	87.1 ± 0.0	–	–	–	–	–
MLCLNet	84.4 ± 1.5	85.2 ± 1.5	89.3 ± 1.1	83.5 ± 1.4	84.0 ± 1.3	88.6 ± 1.2
KGCCA	69.5 ± 0.0	68.0 ± 0.0	69.7 ± 0.0	75.7 ± 0.0	73.7 ± 0.0	75.9 ± 0.0
SCCA	81.0 ± 0.0	81.1 ± 0.0	81.0 ± 0.0	78.3 ± 0.0	78.7 ± 0.0	78.3 ± 0.0
MVAE	75.2 ± 2.6	75.0 ± 3.1	75.3 ± 2.5	82.6 ± 1.6	82.5 ± 1.6	82.7 ± 1.6
CPM	74.2 ± 2.4	72.9 ± 2.7	74.1 ± 2.4	76.8 ± 2.9	75.8 ± 5.3	76.8 ± 3.0
DCP	78.5 ± 0.0	80.5 ± 0.0	78.6 ± 0.0	79.4 ± 0.0	73.4 ± 0.0	77.8 ± 0.0
LHGN	75.9 ± 1.5	75.6 ± 2.3	76.0 ± 1.5	79.9 ± 1.5	80.3 ± 1.3	79.9 ± 1.6
CLCLSA	83.0 ± 0.6	83.6 ± 0.9	88.3 ± 0.0	85.0 ± 0.1	85.4 ± 0.4	90.6 ± 0.7
Dataset	BRCA			KIPAN
	ACC	F1	AUC	ACC	F1	AUC
KNN	74.2 ± 2.4	68.2 ± 2.5	73.0 ± 2.5	96.7 ± 1.1	96.0 ± 1.4	96.7 ± 1.1
SVM	72.9 ± 1.8	64.0 ± 1.7	70.2 ± 1.7	99.5 ± 0.3	99.4 ± 0.4	99.5 ± 0.3
LR	73.2 ± 1.2	64.2 ± 2.6	69.8 ± 2.6	97.4 ± 0.2	97.2 ± 0.4	97.4 ± 0.2
RF	75.4 ± 0.9	64.9 ± 1.3	73.3 ± 1.3	98.1 ± 0.6	97.5 ± 1.1	98.1 ± 0.6
NN	75.4 ± 2.8	66.8 ± 4.7	74.0 ± 4.7	99.1 ± 0.5	99.1 ± 0.5	99.1 ± 0.5
GRridge	74.5 ± 1.6	65.6 ± 2.5	72.6 ± 2.5	99.4 ± 0.4	99.3 ± 0.4	99.4 ± 0.4
BPLSDA	64.2 ± 0.9	36.9 ± 1.7	53.4 ± 1.7	93.3 ± 1.3	91.9 ± 2.1	93.3 ± 1.3
BSPLSDA	63.9 ± 0.8	35.1 ± 2.2	52.2 ± 2.2	91.9 ± 1.2	89.5 ± 1.4	91.8 ± 1.3
MOGONET	82.9 ± 1.8	77.4 ± 1.7	82.5 ± 1.7	99.9 ± 0.2	99.9 ± 0.2	99.9 ± 0.2
TMC	84.2 ± 0.5	80.6 ± 0.9	84.4 ± 0.9	99.7 ± 0.3	99.4 ± 0.5	99.7 ± 0.3
CF	81.5 ± 0.8	77.1 ± 0.9	81.5 ± 0.9	99.2 ± 0.5	98.8 ± 0.9	99.2 ± 0.5
GMU	80.0 ± 3.9	74.6 ± 5.8	79.8 ± 5.8	97.7 ± 1.6	95.8 ± 3.2	97.6 ± 1.7
MMDynamics	87.7 ± 0.3	84.5 ± 0.5	88.0 ± 0.5	99.9 ± 0.2	99.9 ± 0.3	99.9 ± 0.2
UIMC	82.9 ± 0.0	–	–	–	–	–
MLCLNet	86.4 ± 1.6	82.6 ± 1.8	87.8 ± 1.6	99.9 ± 0.7	99.2 ± 0.2	99.2 ± 0.2
KGCCA	73.3 ± 0.0	62.5 ± 0.0	71.0 ± 0.0	93.4 ± 0.0	88.3 ± 0.0	93.1 ± 0.0
SCCA	81.7 ± 0.0	76.8 ± 0.0	81.7 ± 0.0	93.9 ± 0.0	88.6 ± 0.0	93.6 ± 0.0
MVAE	75.9 ± 3.6	65.3 ± 5.2	73.4 ± 5.0	93.7 ± 1.5	92.9 ± 1.7	93.6 ± 1.6
CPM	78.0 ± 2.3	75.0 ± 2.4	78.4 ± 2.2	96.0 ± 1.9	95.3 ± 1.9	96.0 ± 1.9
DCP	81.7 ± 0.0	74.9 ± 0.0	82.3 ± 0.0	97.0 ± 0.0	94.7 ± 0.0	97.0 ± 0.0
LHGN	80.8 ± 0.6	76.6 ± 1.1	80.8 ± 0.7	99.0 ± 0.3	98.4 ± 0.7	99.0 ± 0.3
CLCLSA	87.5 ± 1.0	85.6 ± 0.6	87.8 ± 0.3	99.9 ± 0.3	99.9 ± 0.3	99.9 ± 0.3

Open in a new tab

For the binary classification tasks, we compared the CLCLSA with the existing methods on ROSMAP and LGG datasets. Experimental results demonstrated that the proposed CLCLSA outperformed most methods on binary classification significantly. For the ROSMAP dataset, the CLCLSA achieved an averaged ACC of 83.0 %; for LGG dataset, the CLCLSA improved ACC from 83.3 % achieved by the second-best algorithm to 85.0 %.

For the multi-class classification tasks, we compared the CLCLSA with the existing methods on BRCA and KIPAN datasets. For the BRCA dataset, the proposed method achieved the similar results with MMDynamics, with an averaged ACC of 87.5 %. For the KIPAN dataset, the proposed method achieved the same performance with the MMDynamics [19] and MOGONET [2]. Classifying kidney cancer types using the KIPAN dataset was a relatively simpler task, so all methods achieved quite high performance.

5.4. Performance using incomplete multi-omics data

We further compared the proposed CLCLSA with other existing multi-view incomplete data classification methods on these four public multi-omics datasets. We manually specified the missing rates ranging from 0.1 to 0.8 with an increment of 0.1, and the performance of multi-omics data classification using incomplete multi-omics data is shown in Fig. 2. The detailed hyperparameter settings among different missing rates are shown in Tables S1 and S2. The missing rate is denoted as $=$ $N_{m} / N$ . To be specific, we randomly select $N_{m}$ samples as the incomplete subjects and randomly remove 1 to $M - 1$ views from each of them.

Fig. 2 shows that the CLCLSA method outperformed all other state-of-the-art multi-view classification methods when using incomplete multi-omics data in most settings. For the binary classification task, the CLCLSA achieved the highest ACCs among most of the datasets with different missing rates.

For ROSMAP dataset, the CLCLSA achieved the AUCs of 0.75 if the missing rate was smaller than 0.3. For the performance using the multi-omics data with the missing rate of 0.3, the MVAE, SCCA and LHGN achieved a higher ACC than our CLCLSA.

For LGG dataset, the CLCLSA achieved the highest AUCs using incomplete multi-omics data with different missing rates. Compared to CCA based methods, i.e., KGCCA and SCCA, the ACC, F1-score and AUC obtained by CLCLSA dropped slightly as the increase of the missing rates. CCA based methods cannot impute missing views and the results indicate that CLCLSA benefits from the missing omics completion using cross-omics autoencoders.

For multi-class classification task, the proposed CLCLSA achieved the highest ACCs and WeightedF1s among all compared methods on the BRCA dataset if the missing rate was smaller than 0.5. The CCA based methods cannot recover the feature representations of missing omics layers. Since the CCA based methods were trained using partial training subjects with complete omics data, the performance dropped significantly when using incomplete multi-omics data with large missing rates. With limited training subjects, CCA methods cannot achieve a high performance with large missing rates, such as the missing rates greater than 0.5. For MVAE method, the multi-omics fusion is achieved by the product of latent distributions from the existing omics layers. Though MVAE employed ‘view dropout’ during the model training [57], the features for missing omics layers were not recovered which lowered the performance for the downstream classification tasks. For KIPAN dataset, the achieved ACCs of CLCLSA were higher than 88 % even using the multi-omics data with the missing rate of 0.8. When the missing rate was smaller than 0.2, the performance did not decrease. The results indicated that the cross-omics autoencoders showed the capability for recovering the missing omics data using latent feature representations from the existing omics data.

We further performed the ablation study on these 4 multi-omics datasets using partial omics layers. Under the two-view setting, the cross-omics loss degraded from Eq. (9) to Eq. (8). In Fig. 3, we compared the classification performance of CLCLSA with 4 types of combinations (mRNA + meth + miRNA for combining mRNA expression, DNA methylation with miRNA expression data, mRNA + miRNA for combining mRNA expression with miRNA expression data, mRNA + methy for combining mRNA expression with DNA methylation, and miRNA + methy for combining miRNA expression with DNA methylation data).

From Fig. 3, we can find that:

Integrating three types of omics proved to be advantageous for classification, particularly when dealing with complete data or when the missing rate is small. For example, in the case of multi-omics data classification for LGG and BRCA datasets, integrating three types of omics was found to be beneficial when the missing rates were smaller than 0.1. Similarly, when the missing rates were smaller than 0.3, integrating three-omics layers was beneficial for multi-omics data classification on the KIPAN dataset.
Integrating mRNA expression data can be advantageous even for multi-omics data classification when the missing rate was high. For the ROSMAP and LGG datasets, the CLCLSA models trained with mRNA expression and DNA methylation (in green dashed line) and CLCLSA models trained with mRNA expression and miRNA expression (in orange dashed line) outperformed the CLCLSA models trained using three omics layers across most missing rates. However, the CLCLSA models trained with miRNA expression and DNA methylation (in red dashed line) showed inferior performance compared to the combinations of other three omics if the missing rates were set as 0 to 0.6 with an increment of 0.1. For BRCA dataset, the CLCLSA models achieved the lowest ACCs, WeightedF1 and MacroF1s using miRNA expression and DNA methylation data if the missing rates were set as 0 to 0.5 with an increment of 0.1. These results indicated that mRNA expression data was more important than DNA methylation and miRNA expression data for multi-omics data classification on these four datasets.

5.5. Hyperparameter settings and model performance

We further investigated the model performance under different hyperparameter settings, i.e. $λ_{a l}, λ_{c o}$ and $λ_{c l}$ . We performed the comparison studies on the ROSMAP dataset with the fixed missing rate of 0.2. In Fig. 4(a), we fixed the weight of the auxiliary classification loss $λ_{a l}$ as 0.1 and tested the model performance under different weights of contrastive loss and cross-omics completion loss; in Fig. 4(b), we fixed the weight of the contrastive loss $λ_{c l}$ as 0.1, and tested the model performance under different weights of auxiliary classification loss and cross-omics completion loss; in Fig. 4(c), we fixed the weight of the cross-omics completion loss $λ_{c o}$ as 1, and tested the model performance under different weights of auxiliary classification loss and contrastive loss.

From Fig. 4 (a), it was observed that the proposed CLCLSA achieved the highest ACC, F1-Score and AUC with smaller balancing factors of $λ_{c o}$ and $λ_{c l}$ . When the contrastive loss balancing factor $λ_{c l}$ was fixed at 0.01, monotonously decreasing the balancing factor $λ_{c o}$ increased the model performance. This indicated that setting a high balancing factor for cross-omics autoencoders decreased the performance because the cross-omics latent feature completion was not as important as contrastive learning, which increased the mutual information between different omics and promoted the differentiability between subjects. A smaller balancing factor of cross-omics prediction loss is recommended in our CLCLSA model.

From Fig. 4 (b), it was observed that the proposed CLCLSA achieved the highest performance with a smaller balancing factor of $λ_{a l}$ . The 3D bar charts showed a trend that with a fixed $λ_{c o}$ , the model performance was increased in most settings with the decrease of the $λ_{a l}$ . The $λ_{a l}$ controls the importance of the auxiliary classifiers and omics-specific self-attention encoder; however, it should not suppress the importance of the main classifier, i.e., $c$ , for multi-omics data classification. A smaller balancing factor of $λ_{a l}$ is recommend, such as $λ_{a l} \leq 0.1$ .

From Fig. 4 (c), it was observed that setting a small contrastive loss factor $λ_{c l}$ and a small auxiliary classifier factor $λ_{a l}$ simultaneously was beneficial for improving model performance. The performance of CLCLSA was degraded significantly when $λ_{c l}$ and $λ_{a l}$ were set to 0.1 and 10.0 (in orange bars). However, when both of $λ_{c l}$ and $λ_{a l}$ were set to small values simultaneously, such as 0.01 and 0.1, the performance of the CLCLSA was satisfactory and stable at a high performance.

5.6. Ablation studies

We carried out the ablation studies to investigate the role of the contrastive learning, i.e., $L_{c l}$ , and self-attention with auxiliary classification loss for these classification tasks using incomplete multi-omics data, i.e., $L_{a l}$ . It should be pointed that the cross-omics autoencoders are required to handle the classification with incomplete multi-omics data, we fixed the $λ_{c o} = 0.1$ and performed the ablation studies using the ROSMAP dataset with three omics layers. The performance comparisons of using different components among different missing rates are shown in Fig. 5.

Fig. 5. — Ablation study on the ROSMAP dataset, where ‘ctst’ indicates that the contrastive learning loss was used and ‘aux’ indicates that the self-attention and auxiliary classifier were used. ‘ctst + aux’ indicates that both of the components were used and ‘plain’ indicate that neither of them was used.

By comparing the performance of the proposed model with baseline models, i.e. ctst, aux and plain models, we observed that all components of CLCLSA promoted the model performance in most of the settings. Without employing contrastive learning, self-attention and the auxiliary classifier, the plain models achieved the lowest ACCs among different missing rates. When using either contrastive loss or auxiliary classification loss alone, the model showed a slight improvement in terms of F1-scores. However, employing both contrastive learning and auxiliary classifiers with confidence loss, the performance was significantly boosted to a higher level among different missing rates. This ablation study indicated that all loss terms play indispensable roles in these incomplete multi-omics classification tasks.

5.7. Limitation and future work

The limitation of this study lies in the scalability validation. The experiments conducted in this study were based on the fixed number of selected genes among the public datasets. In the future, the proposed CLCLSA should be tested with different number of genes to validate the scalability. In addition, the models were tested using an uni-cohort for each multi-omics classification task. We recognize the significance of incorporating subjects from diverse cohorts to enhance the generalizability of our findings in the future studies.

In addition to its immediate applications in genomics data integration, our proposed methodology holds promising potential for addressing missing data challenges across diverse domains. Future investigations could focus on exploring the adaptability and efficacy of our approach in contexts beyond genomics, such as healthcare, finance, and multi-view computer vision integration tasks. By delving into the broader landscape of missing data issues in various fields, we aim to establish the versatility of our methodology.

6. Conclusion

In this paper, we proposed a novel algorithm for multi-omics integration and classification, which can jointly exploit all training samples and is flexible for training samples with arbitrary missing omics data. Our CLCLSA model employs cross-omics autoencoders to predict the representation of missing omics data and uses contrastive learning and self-attention modules to boost model performance. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicate that our proposed CLCLSA produce promising results in multi-omics data classification using both complete and incomplete multi-omics data.

Supplementary Material

Supplementary

NIHMS1972582-supplement-Supplementary.docx^{(464.2KB, docx)}

Acknowledgments

This research was partially supported in part by grants from the National Institutes of Health, USA (P20GM109036, R01AR069055, U19AG055373, R15HL172198). It was also supported in part by a seed grant from Michigan Technological University Institute of Computing and Cybersystems, a graduate fellowship from Michigan Technological University Health Research Institute.

Footnotes

CRediT authorship contribution statement

Chen Zhao: Conceptualization, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. Anqi Liu: Data curation, Writing – original draft, Writing – review & editing. Xiao Zhang: Data curation. Xuewei Cao: Data curation, Writing – original draft, Writing – review & editing. Zhengming Ding: Data curation, Writing – original draft. Qiuying Sha: Data curation, Writing – original draft, Writing – review & editing. Hui Shen: Data curation, Funding acquisition, Supervision, Writing – original draft, Writing – review & editing. Hong-Wen Deng: Conceptualization, Data curation, Funding acquisition, Investigation, Project administration, Supervision, Writing – original draft, Writing – review & editing. Weihua Zhou: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Project administration, Supervision, Writing – original draft, Writing – review & editing.

Declaration of competing interest

The authors declare that there are no conflicts of interest.

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.compbiomed.2024.108058.

References

[1].Kim D, Li R, Dudek SM, Ritchie MD, Athena, Identifying interactions between different levels of genomic data associated with cancer clinical outcomes using grammatical evolution neural network, BioData Min. 6 (2013) 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Wang T, Shao W, Huang Z, Tang H, Zhang J, Ding Z, Huang K, MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification, Nat. Commun 12 (2021) 3445, 10.1038/s41467-021-23774-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Cao Z-J, Gao G, Multi-omics single-cell data integration and regulatory inference with graph-linked embedding, Nat. Biotechnol (2022), 10.1038/s41587-022-01284-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Das S, Mukhopadhyay I, TiMEG: an integrative statistical method for partially missing multi-omics data, Sci. Rep 11 (2021) 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Gao C, Liu J, Kriebel AR, Preissl S, Luo C, Castanon R, Sandoval J, Rivkin A, Nery JR, Behrens MM, Iterative single-cell multi-omic integration using online learning, Nat. Biotechnol 39 (2021) 1000–1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Welch JD, Kozareva V, Ferreira A, Vanderburg C, Martin C, Macosko EZ, Single-cell multi-omic integration compares and contrasts features of brain cell identity, Cell 177 (2019) 1873–1887, e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Chen H, Lareau C, Andreani T, Vinyard ME, Garcia SP, Clement K, Andrade-Navarro MA, Buenrostro JD, Pinello L, Assessment of computational methods for the analysis of single-cell ATAC-seq data, Genome Biol. 20 (2019), 1–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Fang Z, Ma T, Tang G, Zhu L, Yan Q, Wang T, Celedón JC, Chen W, Tseng GC, Bayesian integrative model for multi-omics data with missingness, Bioinformatics 34 (2018) 3801–3808. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Zhou X, Chai H, Zhao H, Luo C-H, Yang Y, Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network, GigaScience 9 (2020) giaa076. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Koh HWL, Fermin D, Vogel C, Choi KP, Ewing RM, Choi H, iOmicsPASS: network-based integration of multiomics data for predictive subnetwork discovery Npj Syst Biol Appl 5 (2019) 22, 10.1038/s41540-019-0099-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Chaudhary K, Poirion OB, Lu L, Garmire LX, Deep learning–based multi-omics integration robustly predicts survival in liver cancer, Clin. Cancer Res 24 (2018) 1248–1259, 10.1158/1078-0432.CCR-17-0853. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Luo W, Brouwer C, Pathview: an R/Bioconductor package for pathway-based data integration and visualization, Bioinformatics 29 (2013) 1830–1831, 10.1093/bioinformatics/btt285. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Kaczmarek E, Jamzad A, Imtiaz T, Nanayakkara J, Renwick N, Mousavi P, Multi-Omic graph transformers for cancer classification and interpretation, in: Biocomputing 2022, WORLD SCIENTIFIC, Kohala Coast, Hawaii, USA, 2021, pp. 373–384, 10.1142/9789811250477_0034. [DOI] [PubMed] [Google Scholar]
[14].Li Y, Yang M, Zhang Z, A survey of multi-view representation learning, IEEE Trans. Knowl. Data Eng 31 (2019) 1863–1883, 10.1109/TKDE.2018.2872063. [DOI] [Google Scholar]
[15].Blum A, Mitchell T, Combining labeled and unlabeled data with co-training, in: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, 1998, pp. 92–100. [Google Scholar]
[16].Gehler P, Nowozin S, On feature combination for multiclass object classification, in: IEEE 12th International Conference on Computer Vision, IEEE, Kyoto, 2009, pp. 221–228, 10.1109/ICCV.2009.5459169, 2009. [DOI] [Google Scholar]
[17].Horst P, Generalized canonical correlations and their application to experimental data, J. Clin. Psychol 17 (1961) 331–347. [DOI] [PubMed] [Google Scholar]
[18].Xie G, Dong C, Kong Y, Zhong JF, Li M, Wang K, Group lasso regularized deep learning for cancer prognosis from multi-omics and clinical features, Genes 10 (2019) 240. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Han Z, Yang F, Huang J, Zhang C, Yao J, Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification, (n.d.) 11.. [Google Scholar]
[20].Flores JE, Claborne DM, Weller ZD, Webb-Robertson B-JM, Waters KM, Bramer LM, Missing data in multi-omics integration: recent advances through artificial intelligence, Front. Artif. Intell 6 (2023) 1098308, 10.3389/frai.2023.1098308. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Cai T, Cai TT, Zhang A, Structured matrix completion with applications to genomic data integration, J. Am. Stat. Assoc 111 (2016) 621–633. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Tran L, Liu X, Zhou J, Jin R, Missing modalities imputation via cascaded residual autoencoder, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1405–1414. [Google Scholar]
[23].Lee C, van der Schaar M, A Variational Information Bottleneck Approach to Multi-Omics Data Integration, 2021. http://arxiv.org/abs/2102.03014. August 2, 2022.
[24].Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, Buettner F, Huber W, Stegle O, Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets, Mol. Syst. Biol 14 (2018) e8124. [DOI] [PMC free article] [PubMed] [Google Scholar]
[25].Argelaguet R, Arnol D, Bredikhin D, Deloro Y, Velten B, Marioni JC, Stegle O, MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data, Genome Biol. 21 (2020) 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
[26].Shi Y, Siddharth N, Paige B, Torr PHS, Variational Mixture-Of-Experts Autoencoders for Multi-Modal Deep Generative Models, 2019. http://arxiv.org/abs/1911.03393. August 7, 2022.
[27].Tian Y, Krishnan D, Isola P, Contrastive Multiview Coding, 2020. http://arxiv.org/abs/1906.05849. January 23, 2023.
[28].Chen T, Kornblith S, Norouzi M, Hinton G, A simple framework for contrastive learning of visual representations, in: Proceedings of the 37th International Conference on Machine Learning, PMLR, 2020, pp. 1597–1607, in: https://proceedings.mlr.press/v119/chen20j.html. January 23, 2023. [Google Scholar]
[29].Fisher A, Rudin C, Dominici F, All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously, J. Mach. Learn. Res 20 (2019) 1–81. [PMC free article] [PubMed] [Google Scholar]
[30].Hou H, Zheng Q, Zhao Y, Pouget A, Gu Y, Neural correlates of optimal multisensory decision making under time-varying reliabilities with an invariant linear probabilistic population code, Neuron 104 (2019) 1010–1021, e10. [DOI] [PubMed] [Google Scholar]
[31].Miao Z, Humphreys BD, McMahon AP, Kim J, Multi-omics integration in the age of million single-cell data, Nat. Rev. Nephrol 17 (2021) 710–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Lin Y, Gou Y, Liu X, Bai J, Lv J, Peng X, Dual contrastive prediction for incomplete multi-view representation learning, IEEE Trans. Pattern Anal. Mach. Intell (2022) 1–14, 10.1109/TPAMI.2022.3197238. [DOI] [PubMed] [Google Scholar]
[33].Tu X, Cao Z-J, Xia C-R, Mostafavi S, Gao G, Cross-Linked Unified Embedding for Cross-Modality Representation Learning, (n.d.) 14.. [Google Scholar]
[34].He K, Fan H, Wu Y, Xie S, Girshick R, Momentum contrast for unsupervised visual representation learning, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2020, pp. 9726–9735, 10.1109/CVPR42600.2020.00975. [DOI] [Google Scholar]
[35].Bennett DA, Schneider JA, Arvanitakis Z, Wilson RS, Overview and findings from the religious orders study, Curr. Alzheimer Res 9 (2012) 628–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].Bennett DA, Schneider JA, S Buchman A, L Barnes L, A Boyle P, Wilson RS, Overview and findings from the rush memory and aging project, Curr. Alzheimer Res 9 (2012) 646–663. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].C.G.A.R. Network, Comprehensive, integrative genomic analysis of diffuse lowergrade gliomas, N. Engl. J. Med 372 (2015) 2481–2498. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].The Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM, The cancer genome atlas pan-cancer analysis project, Nat. Genet 45 (2013) 1113–1120, 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Fix E, Discriminatory Analysis: Nonparametric Discrimination, Consistency Properties, USAF school of Aviation Medicine, 1985. [Google Scholar]
[40].Cortes C, Vapnik V, Support-vector networks, Mach. Learn 20 (1995) 273–297. [Google Scholar]
[41].Ho TK, Random decision forests, in: Proceedings of 3rd International Conference on Document Analysis and Recognition, IEEE, 1995, pp. 278–282. [Google Scholar]
[42].van de Wiel MA, Lien TG, Verlaat W, van Wieringen WN, Wilting SM, Better prediction by use of co-data: adaptive group-regularized ridge regression, Stat. Med 35 (2016) 368–381, 10.1002/sim.6732. [DOI] [PubMed] [Google Scholar]
[43].Singh A, Shannon CP, Gautier B, Rohart F, Vacher M, Tebbutt SJ, Lê Cao K-A, DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays, Bioinformatics 35 (2019) 3055–3062, 10.1093/bioinformatics/bty1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
[44].Han Z, Zhang C, Fu H, Zhou JT, Trusted multi-view classification with dynamic evidential fusion, IEEE Trans. Pattern Anal. Mach. Intell 45 (2023) 2551–2566, 10.1109/TPAMI.2022.3171983. [DOI] [PubMed] [Google Scholar]
[45].Hong D, Gao L, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B, More diverse means better: multimodal deep learning meets remote-sensing imagery classification, IEEE Trans. Geosci. Rem. Sens 59 (2021) 4340–4354, 10.1109/TGRS.2020.3016820. [DOI] [Google Scholar]
[46].Arevalo J, Solorio T, Montes-y-Gómez M, González FA, Gated Multimodal Units for Information Fusion, 2017. http://arxiv.org/abs/1702.01992. March 8, 2023.
[47].Xie M, Han Z, Zhang C, Bai Y, Hu Q, Exploring and exploiting uncertainty for incomplete multi-view classification, in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Canada, Vancouver, BC, 2023, pp. 19873–19882, 10.1109/CVPR52729.2023.01903. [DOI] [Google Scholar]
[48].Zheng X, Tang C, Wan Z, Hu C, Zhang W, Multi-level confidence learning for trustworthy multimodal classification, AAAI 37 (2023) 11381–11389, 10.1609/aaai.v37i9.26346. [DOI] [Google Scholar]
[49].Tenenhaus A, Philippe C, Frouin V, Kernel generalized canonical correlation analysis, Comput. Stat. Data Anal 90 (2015) 114–131, 10.1016/j.csda.2015.04.004. [DOI] [Google Scholar]
[50].Gao C, Ma Z, Zhou HH, Sparse CCA, Adaptive estimation and computational barriers, Ann. Stat 45 (2017), 10.1214/16-AOS1519. [DOI] [Google Scholar]
[51].Zhao C, Keyak JH, Cao X, Sha Q, Wu L, Luo Z, Zhao L, Tian Q, Qiu C, Su R, Shen H, Deng H-W, Zhou W, Multi-view Information Fusion Using Multi-View Variational Autoencoders to Predict Proximal Femoral Strength, 2022, 10.48550/arXiv.2210.00674. [DOI] [PMC free article] [PubMed] [Google Scholar]
[52].Zhang C, Han Z, yajie cui, Fu H, Zhou JT, Hu Q, CPM-nets: cross partial multi-view networks, in: Advances in Neural Information Processing Systems, Curran Associates, Inc., 2019, in: https://proceedings.neurips.cc/paper/2019/hash/11b9842e0a271ff252c1903e7132cd68-Abstract.html. February 25, 2023. [Google Scholar]
[53].Zhu P, Yao X, Wang Y, Cao M, Hui B, Zhao S, Hu Q, Latent heterogeneous graph network for incomplete multi-view learning, IEEE Trans. Multimed 25 (2023) 3033–3045, 10.1109/TMM.2022.3154592. [DOI] [Google Scholar]
[54].Ioffe S, Szegedy C, Batch normalization: accelerating deep network training by reducing internal covariate shift, in: International Conference on Machine Learning, PMLR, 2015, pp. 448–456. [Google Scholar]
[55].Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Pytorch: an imperative style, high-performance deep learning library, Adv. Neural Inf. Process. Syst 32 (2019). [Google Scholar]
[56].Kingma DP, Ba J, Adam, A Method for Stochastic Optimization, 2014 arXiv Preprint arXiv:1412.6980. [Google Scholar]
[57].Kaloga Y, Borgnat P, Chepuri SP, Abry P, Habrard A, Variational graph autoencoders for multiview canonical correlation analysis, Signal Process. 188 (2021) 108182, 10.1016/j.sigpro.2021.108182. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

NIHMS1972582-supplement-Supplementary.docx^{(464.2KB, docx)}

[R1] [1].Kim D, Li R, Dudek SM, Ritchie MD, Athena, Identifying interactions between different levels of genomic data associated with cancer clinical outcomes using grammatical evolution neural network, BioData Min. 6 (2013) 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Wang T, Shao W, Huang Z, Tang H, Zhang J, Ding Z, Huang K, MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification, Nat. Commun 12 (2021) 3445, 10.1038/s41467-021-23774-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Cao Z-J, Gao G, Multi-omics single-cell data integration and regulatory inference with graph-linked embedding, Nat. Biotechnol (2022), 10.1038/s41587-022-01284-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Das S, Mukhopadhyay I, TiMEG: an integrative statistical method for partially missing multi-omics data, Sci. Rep 11 (2021) 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Gao C, Liu J, Kriebel AR, Preissl S, Luo C, Castanon R, Sandoval J, Rivkin A, Nery JR, Behrens MM, Iterative single-cell multi-omic integration using online learning, Nat. Biotechnol 39 (2021) 1000–1007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Welch JD, Kozareva V, Ferreira A, Vanderburg C, Martin C, Macosko EZ, Single-cell multi-omic integration compares and contrasts features of brain cell identity, Cell 177 (2019) 1873–1887, e17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Chen H, Lareau C, Andreani T, Vinyard ME, Garcia SP, Clement K, Andrade-Navarro MA, Buenrostro JD, Pinello L, Assessment of computational methods for the analysis of single-cell ATAC-seq data, Genome Biol. 20 (2019), 1–25. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Fang Z, Ma T, Tang G, Zhu L, Yan Q, Wang T, Celedón JC, Chen W, Tseng GC, Bayesian integrative model for multi-omics data with missingness, Bioinformatics 34 (2018) 3801–3808. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Zhou X, Chai H, Zhao H, Luo C-H, Yang Y, Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network, GigaScience 9 (2020) giaa076. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Koh HWL, Fermin D, Vogel C, Choi KP, Ewing RM, Choi H, iOmicsPASS: network-based integration of multiomics data for predictive subnetwork discovery Npj Syst Biol Appl 5 (2019) 22, 10.1038/s41540-019-0099-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Chaudhary K, Poirion OB, Lu L, Garmire LX, Deep learning–based multi-omics integration robustly predicts survival in liver cancer, Clin. Cancer Res 24 (2018) 1248–1259, 10.1158/1078-0432.CCR-17-0853. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Luo W, Brouwer C, Pathview: an R/Bioconductor package for pathway-based data integration and visualization, Bioinformatics 29 (2013) 1830–1831, 10.1093/bioinformatics/btt285. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Kaczmarek E, Jamzad A, Imtiaz T, Nanayakkara J, Renwick N, Mousavi P, Multi-Omic graph transformers for cancer classification and interpretation, in: Biocomputing 2022, WORLD SCIENTIFIC, Kohala Coast, Hawaii, USA, 2021, pp. 373–384, 10.1142/9789811250477_0034. [DOI] [PubMed] [Google Scholar]

[R14] [14].Li Y, Yang M, Zhang Z, A survey of multi-view representation learning, IEEE Trans. Knowl. Data Eng 31 (2019) 1863–1883, 10.1109/TKDE.2018.2872063. [DOI] [Google Scholar]

[R15] [15].Blum A, Mitchell T, Combining labeled and unlabeled data with co-training, in: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, 1998, pp. 92–100. [Google Scholar]

[R16] [16].Gehler P, Nowozin S, On feature combination for multiclass object classification, in: IEEE 12th International Conference on Computer Vision, IEEE, Kyoto, 2009, pp. 221–228, 10.1109/ICCV.2009.5459169, 2009. [DOI] [Google Scholar]

[R17] [17].Horst P, Generalized canonical correlations and their application to experimental data, J. Clin. Psychol 17 (1961) 331–347. [DOI] [PubMed] [Google Scholar]

[R18] [18].Xie G, Dong C, Kong Y, Zhong JF, Li M, Wang K, Group lasso regularized deep learning for cancer prognosis from multi-omics and clinical features, Genes 10 (2019) 240. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Han Z, Yang F, Huang J, Zhang C, Yao J, Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification, (n.d.) 11.. [Google Scholar]

[R20] [20].Flores JE, Claborne DM, Weller ZD, Webb-Robertson B-JM, Waters KM, Bramer LM, Missing data in multi-omics integration: recent advances through artificial intelligence, Front. Artif. Intell 6 (2023) 1098308, 10.3389/frai.2023.1098308. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Cai T, Cai TT, Zhang A, Structured matrix completion with applications to genomic data integration, J. Am. Stat. Assoc 111 (2016) 621–633. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Tran L, Liu X, Zhou J, Jin R, Missing modalities imputation via cascaded residual autoencoder, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1405–1414. [Google Scholar]

[R23] [23].Lee C, van der Schaar M, A Variational Information Bottleneck Approach to Multi-Omics Data Integration, 2021. http://arxiv.org/abs/2102.03014. August 2, 2022.

[R24] [24].Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, Buettner F, Huber W, Stegle O, Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets, Mol. Syst. Biol 14 (2018) e8124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] [25].Argelaguet R, Arnol D, Bredikhin D, Deloro Y, Velten B, Marioni JC, Stegle O, MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data, Genome Biol. 21 (2020) 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] [26].Shi Y, Siddharth N, Paige B, Torr PHS, Variational Mixture-Of-Experts Autoencoders for Multi-Modal Deep Generative Models, 2019. http://arxiv.org/abs/1911.03393. August 7, 2022.

[R27] [27].Tian Y, Krishnan D, Isola P, Contrastive Multiview Coding, 2020. http://arxiv.org/abs/1906.05849. January 23, 2023.

[R28] [28].Chen T, Kornblith S, Norouzi M, Hinton G, A simple framework for contrastive learning of visual representations, in: Proceedings of the 37th International Conference on Machine Learning, PMLR, 2020, pp. 1597–1607, in: https://proceedings.mlr.press/v119/chen20j.html. January 23, 2023. [Google Scholar]

[R29] [29].Fisher A, Rudin C, Dominici F, All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously, J. Mach. Learn. Res 20 (2019) 1–81. [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Hou H, Zheng Q, Zhao Y, Pouget A, Gu Y, Neural correlates of optimal multisensory decision making under time-varying reliabilities with an invariant linear probabilistic population code, Neuron 104 (2019) 1010–1021, e10. [DOI] [PubMed] [Google Scholar]

[R31] [31].Miao Z, Humphreys BD, McMahon AP, Kim J, Multi-omics integration in the age of million single-cell data, Nat. Rev. Nephrol 17 (2021) 710–724. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Lin Y, Gou Y, Liu X, Bai J, Lv J, Peng X, Dual contrastive prediction for incomplete multi-view representation learning, IEEE Trans. Pattern Anal. Mach. Intell (2022) 1–14, 10.1109/TPAMI.2022.3197238. [DOI] [PubMed] [Google Scholar]

[R33] [33].Tu X, Cao Z-J, Xia C-R, Mostafavi S, Gao G, Cross-Linked Unified Embedding for Cross-Modality Representation Learning, (n.d.) 14.. [Google Scholar]

[R34] [34].He K, Fan H, Wu Y, Xie S, Girshick R, Momentum contrast for unsupervised visual representation learning, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2020, pp. 9726–9735, 10.1109/CVPR42600.2020.00975. [DOI] [Google Scholar]

[R35] [35].Bennett DA, Schneider JA, Arvanitakis Z, Wilson RS, Overview and findings from the religious orders study, Curr. Alzheimer Res 9 (2012) 628–645. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] [36].Bennett DA, Schneider JA, S Buchman A, L Barnes L, A Boyle P, Wilson RS, Overview and findings from the rush memory and aging project, Curr. Alzheimer Res 9 (2012) 646–663. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] [37].C.G.A.R. Network, Comprehensive, integrative genomic analysis of diffuse lowergrade gliomas, N. Engl. J. Med 372 (2015) 2481–2498. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].The Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM, The cancer genome atlas pan-cancer analysis project, Nat. Genet 45 (2013) 1113–1120, 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] [39].Fix E, Discriminatory Analysis: Nonparametric Discrimination, Consistency Properties, USAF school of Aviation Medicine, 1985. [Google Scholar]

[R40] [40].Cortes C, Vapnik V, Support-vector networks, Mach. Learn 20 (1995) 273–297. [Google Scholar]

[R41] [41].Ho TK, Random decision forests, in: Proceedings of 3rd International Conference on Document Analysis and Recognition, IEEE, 1995, pp. 278–282. [Google Scholar]

[R42] [42].van de Wiel MA, Lien TG, Verlaat W, van Wieringen WN, Wilting SM, Better prediction by use of co-data: adaptive group-regularized ridge regression, Stat. Med 35 (2016) 368–381, 10.1002/sim.6732. [DOI] [PubMed] [Google Scholar]

[R43] [43].Singh A, Shannon CP, Gautier B, Rohart F, Vacher M, Tebbutt SJ, Lê Cao K-A, DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays, Bioinformatics 35 (2019) 3055–3062, 10.1093/bioinformatics/bty1054. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] [44].Han Z, Zhang C, Fu H, Zhou JT, Trusted multi-view classification with dynamic evidential fusion, IEEE Trans. Pattern Anal. Mach. Intell 45 (2023) 2551–2566, 10.1109/TPAMI.2022.3171983. [DOI] [PubMed] [Google Scholar]

[R45] [45].Hong D, Gao L, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B, More diverse means better: multimodal deep learning meets remote-sensing imagery classification, IEEE Trans. Geosci. Rem. Sens 59 (2021) 4340–4354, 10.1109/TGRS.2020.3016820. [DOI] [Google Scholar]

[R46] [46].Arevalo J, Solorio T, Montes-y-Gómez M, González FA, Gated Multimodal Units for Information Fusion, 2017. http://arxiv.org/abs/1702.01992. March 8, 2023.

[R47] [47].Xie M, Han Z, Zhang C, Bai Y, Hu Q, Exploring and exploiting uncertainty for incomplete multi-view classification, in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Canada, Vancouver, BC, 2023, pp. 19873–19882, 10.1109/CVPR52729.2023.01903. [DOI] [Google Scholar]

[R48] [48].Zheng X, Tang C, Wan Z, Hu C, Zhang W, Multi-level confidence learning for trustworthy multimodal classification, AAAI 37 (2023) 11381–11389, 10.1609/aaai.v37i9.26346. [DOI] [Google Scholar]

[R49] [49].Tenenhaus A, Philippe C, Frouin V, Kernel generalized canonical correlation analysis, Comput. Stat. Data Anal 90 (2015) 114–131, 10.1016/j.csda.2015.04.004. [DOI] [Google Scholar]

[R50] [50].Gao C, Ma Z, Zhou HH, Sparse CCA, Adaptive estimation and computational barriers, Ann. Stat 45 (2017), 10.1214/16-AOS1519. [DOI] [Google Scholar]

[R51] [51].Zhao C, Keyak JH, Cao X, Sha Q, Wu L, Luo Z, Zhao L, Tian Q, Qiu C, Su R, Shen H, Deng H-W, Zhou W, Multi-view Information Fusion Using Multi-View Variational Autoencoders to Predict Proximal Femoral Strength, 2022, 10.48550/arXiv.2210.00674. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] [52].Zhang C, Han Z, yajie cui, Fu H, Zhou JT, Hu Q, CPM-nets: cross partial multi-view networks, in: Advances in Neural Information Processing Systems, Curran Associates, Inc., 2019, in: https://proceedings.neurips.cc/paper/2019/hash/11b9842e0a271ff252c1903e7132cd68-Abstract.html. February 25, 2023. [Google Scholar]

[R53] [53].Zhu P, Yao X, Wang Y, Cao M, Hui B, Zhao S, Hu Q, Latent heterogeneous graph network for incomplete multi-view learning, IEEE Trans. Multimed 25 (2023) 3033–3045, 10.1109/TMM.2022.3154592. [DOI] [Google Scholar]

[R54] [54].Ioffe S, Szegedy C, Batch normalization: accelerating deep network training by reducing internal covariate shift, in: International Conference on Machine Learning, PMLR, 2015, pp. 448–456. [Google Scholar]

[R55] [55].Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Pytorch: an imperative style, high-performance deep learning library, Adv. Neural Inf. Process. Syst 32 (2019). [Google Scholar]

[R56] [56].Kingma DP, Ba J, Adam, A Method for Stochastic Optimization, 2014 arXiv Preprint arXiv:1412.6980. [Google Scholar]

[R57] [57].Kaloga Y, Borgnat P, Chepuri SP, Abry P, Habrard A, Variational graph autoencoders for multiview canonical correlation analysis, Signal Process. 188 (2021) 108182, 10.1016/j.sigpro.2021.108182. [DOI] [Google Scholar]

PERMALINK

CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data

Chen Zhao

Anqi Liu

Xiao Zhang

Xuewei Cao

Zhengming Ding

Qiuying Sha

Hui Shen

Hong-Wen Deng

Weihua Zhou

Abstract

1. Introduction

2. Contribution

Fig. 1.

3. Related work

3.1. Multi-view integration

3.2. Incomplete multi-view representation learning

3.3. Contrastive learning

4. Methodology

4.1. Self-attention based dynamical integration

Feature-level self-attention.

Omics-level self-attention.

Multi-view fusion.

4.2. Missing omics completion

4.3. Contrastive learning

4.4. Loss function and training strategy

5. Experiments and discussion

5.1. Datasets and comparison approaches

Evaluations.

5.2. Network architecture and implementation details

Table 1.

Grid search.

5.3. Performance using complete multi-omics data

Table 2.

5.4. Performance using incomplete multi-omics data

Fig. 2.

Fig. 3.

5.5. Hyperparameter settings and model performance

Fig. 4.

5.6. Ablation studies

Fig. 5.

5.7. Limitation and future work

6. Conclusion

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases