Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2025 Oct 10;26(5):bbaf532. doi: 10.1093/bib/bbaf532

MODA: a graph convolutional network-based multi-omics integration framework for unraveling hub molecules and disease mechanisms

Jinhui Zhao 1,2,3, Yanyan Zhou 4,5,6, Han Bao 7,8,9, Xinjie Zhao 10,11,12, Xinxin Wang 13,14,15, Chunxia Zhao 16,17,18, Wangshu Qin 19,20,21, Xin Lu 22,23,24,, Guowang Xu 25,26,27,
PMCID: PMC12513173  PMID: 41071612

Abstract

Advances in omics technologies provide unprecedented opportunities for systems biology, yet integrating multi-omics data remains challenging due to its complexity, heterogeneity, and the sparsity of prior knowledge networks. Here, we introduce a multi-omics data integration analysis (MODA) framework that fully incorporates prior knowledge to identify hub molecules and pathways, and elucidate biological mechanisms. By leveraging multiple machine learning approaches, MODA transforms raw omics data into a feature importance matrix that is mapped onto a biological knowledge graph to mitigate omics data noise. Then, it uses graph convolutional networks with attention mechanisms to capture intricate molecular relationships and rank molecules via a feature-selective layer. Ultimately, MODA transcends the limitations of predefined pathway annotations by employing an overlapping community detection algorithm to extract core functional modules that are involved in multiple pivotal disease pathways. Systematic evaluations show that MODA outperforms seven existing multi-omics integration methods in classification performance while maintaining biological interpretability. Moreover, MODA achieves superior stability in pan-cancer datasets. Application to the multi-omics datasets of prostate cancer reveals a key role for carnitine and palmitoylcarnitine, regulated by BBOX1 in the progression of prostate cancer. Population samples and in vitro experiments further validate these findings. With high data utilization efficiency and low computational cost, MODA serves as a robust tool for uncovering novel disease mechanisms and advancing precision medicine.

Keywords: multi-omics integration; deep learning, graph convolutional networks; biological knowledge graph; prostate cancer

Introduction

Advancements in analytical technologies have significantly revolutionized various omics methods, including genomics, transcriptomics, proteomics, and metabolomics. These developments have generated vast datasets that contribute invaluable insights into the molecular basis of biological processes, improving our understanding of phenotype variation [1], genetic diseases [2], disease diagnosis, and prognosis [3], as well as the role of proteins in cellular functions and cancer therapies [4], and medical research [5, 6]. However, despite the valuable insights offered by individual omics, each omics approach provides only a partial view of the complex molecular regulatory networks [4]. It is necessary to integrate multi-omics data to achieve a more comprehensive understanding of life processes.

Metabolomics reflects both endogenous metabolic pathways and external factors such as diet, drugs, toxins, and lifestyle choices. It plays a crucial role in bridging the gap between genotypes and phenotypes, providing unique insights into the complex gene–environment interactions [7]. Despite its growing significance, the integration of metabolomics with other omics data remains challenging due to the unique complexities of metabolomics data, including high dimensionality, variability, data sparsity, and missing values. These challenges are often inadequately addressed by conventional multi-omics integration approaches, which tend to focus on statistical correlations, network-based associations, or machine learning (ML) models that may not fully capture the nonlinear and context-dependent nature of biological systems.

The prevailing approaches for multi-omics data integration typically encompass three categories: statistical integration, network-based integration [8], and ML methods [9]. Statistical methods aim to identify shared patterns across datasets but often overlook the complex and nonlinear relationships inherent in biological data. Network-based approaches [10], though valuable for mapping molecular interactions, can oversimplify the complex, dynamic, and multifaceted nature of omics data integration. ML, especially ensemble models like random forests (RFs) or gradient boosting, have demonstrated success in biomarker identification and classification tasks, benefiting from feature selection and robustness to noise [11]. However, these models still rely on handcrafted features or shallow representations, limiting their capacity to model complex biological processes. In contrast, deep learning (DL) has shown promise in uncovering hidden patterns from high-dimensional omics data. From a training perspective, popular ensemble/ML methods (e.g. RFs) can become memory-intensive when applied to large-scale datasets [12]. As a rapidly advancing branch of ML, pretrained DL models enable transfer learning across related biological tasks, whereas ensemble models typically require retraining from scratch [13]. In particular, given the inherent properties of biological networks, graph-structured DL frameworks have been successfully applied to infer biological mechanisms [14] and assist in disease diagnosis [15]. Nevertheless, biological data, especially in multi-omics integration, often exhibit high heterogeneity and noise, complicating effective feature extraction [16], increasing the risk of overfitting, and limiting interpretability [17]. Therefore, selecting appropriate input features and strategically combining the strengths of ML and DL presents a promising direction for advancing multi-omics data integration.

To address these challenges, we introduce MODA, a multi-omics data integration analysis framework that leverages graph convolutional networks (GCNs) with attention mechanisms and prior knowledge (Fig. 1). MODA is specifically designed to enhance metabolomics integration with other omics data. MODA facilitates the discovery of hub molecules and pathways in disease research. MODA enables more accurate and interpretable integration of multi-omics data. By utilizing GCNs, MODA captures both omics-specific features and the intricate relationships among molecules, providing a deeper understanding of disease mechanisms. We demonstrate the efficacy of MODA in uncovering novel hub molecules and pathways across different stages of prostate cancer (PRAD), which is supported by both population-based and in vitro validation. Our findings underscore the potential of MODA as a robust tool for advancing precision medicine and enhancing our understanding of complex biological systems.

Figure 1.

Alt text: The two-step strategy of the MODA framework. The first step leverages ML to preprocess multi-omics data for disease-specific biological network. The second step employs a graph convolutional neural network based on the graph attention mechanism to capture key molecules, and uses the CPM algorithm to identify biological functional modules.

Workflow of the GCN-based MODA framework for unraveling hub molecules and disease-specific modules. The framework consists of two main parts: disease-specific network construction and information mining. Disease-specific network construction involves the integration of data from multiple open-access databases to generate a biological interaction knowledge graph. In the absence of metabolomics data, metabolic fluxes are predicted from transcriptomics profiles. Disease-specific multi-omics experimental data are collected and transformed into an attribution matrix representing node importance scores through ML and network topology analyses. Using the experimental data, a disease-specific biological network is extracted. This matrix is mapped onto the biological graph to facilitate information mining. The DL framework processes the disease-specific biological network to prioritize key molecules and core modules. VIP, variable importance in projection; CPM, clique percolation method. Feat_nodes, nodes with experimentally measured omics data; Hidden_nodes, nodes without direct experimental measurements; predicted nodes, nodes with importance scores predicted by MODA.

Materials and methods

Omics data collection

Performance evaluation

We employed a stepwise strategy integrating public and in-house omics datasets to evaluate MODA’s performance. The mRNA expression profiles (Fragments Per Kilobase of transcript per Million mapped fragments normalization, FPKM), miRNA expression profiles, and phenotype data of prostate cancer (PRAD) were downloaded from The Cancer Genome Atlas database (TCGA, https://portal.gdc.cancer.gov/) by the “TCGAbiolinks” R package. The samples were divided into cancerous tissues or adjacent normal tissues (ANTs) based on the sample labels (Cancerous tissues, 01A; ANTs, 11A). Cancerous tissues were divided into T2 and T3 stages based on the Tumor, Node, Metastasis (TNM) classification system. Since TCGA–PRAD does not provide metabolomics, metabolic fluxes for PRAD datasets were predicted using mRNA profiles by the bioinformatics pipeline developed by Lewis [18]. Code execution was performed in PyCharm (Professional Version 2022.2.3, JetBrains, Prague, Czech Republic). The COBRA Toolbox [19] was applied to simulate genome-wide knockouts and infer the gene functions.

To validate the key molecules and pathways identified in TCGA–PRAD, two independent datasets were collected from PRAD patients who underwent radical prostatectomy at Shanghai Changhai Hospital. These datasets uniquely include metabolomics, lipidomics, transcriptomics, and miRNA profiles with comprehensive clinical annotations. Batch 1 included 21 pairs of cancerous and matched ANTs, while Batch 2 included 50 cancerous tissues and 50 ANTs (49 paired). All experimental protocols were approved by the institutional review board, and informed consent was obtained from all subjects.

Generalization verification

To further evaluate the generalizability of MODA beyond prostate cancer, we obtained the transcriptomics profiles across 21 cancer types from NCBI using the TCGA portal (https://portal.gdc.cancer.gov/) for assessing the generalizability of MODA model.

Multi-omics data integration analysis computational framework

Construction of disease-specific biological network

A disease-specific biological knowledge graph was assembled from multiple curated databases, including KEGG (https://www.genome.jp/kegg/), HMDB (https://hmdb.ca/), BRENDA (https://www.brenda-enzymes.org/), STRING (https://string-db.org/), iRefIndex (https://irefindex.vib.be/), HuRi (http://www.interactome-atlas.org/), TRRUST (http://www.grnpedia.org/trrust), and OmniPath (“OmniPath” R package). Interactions among metabolites, genes, enzymes, and miRNA were standardized and deduplicated to generate a unified undirected graph.

To generate initial feature representations, we applied four complementary ML and statistical methods: t-tests (p-FDR), fold change (FC), RF, LASSO, and Partial Least Squares Discriminant Analysis. These methods produced feature-level importance scores that were normalized and integrated into a unified attribute matrix. This matrix reflects the contribution of each molecule to disease classification. Significant molecules derived from diverse omics types were mapped into the biological network as seed nodes. A k-step neighborhood subgraph was constructed by expanding from the seed nodes. Based on the evaluation of k values ranging from 1 to 5 (Supplementary Fig. S1), k = 2 was selected to balance network coverage and maintain ~1:1 ratio between Feat_nodes (nodes with experimental measurements) and hidden_nodes (nodes without direct experimental measurements). Hidden_nodes represent molecules presented in the knowledge graph without direct experimental data in the study dataset. These nodes capture unmeasured but biologically relevant components. Including these nodes allows the model to propagate information across the network, improving system-level inference. The constructed subgraph served as the input for the graph learning module.

Graph representation learning and score prediction

Given the constructed graph Inline graphic, where Inline graphic includes Feat_nodes (Inline graphic) and Hidden_nodes (Inline graphic), E is the set of undirected edges, Inline graphic represents the feature matrix derived from multi-omics importance scores, and Y is the class label (available only for Feat_nodes). MODA applies a two-layer GCN to propagate and refine node attributes. The GCN updates node attributes by the aggregation function and stacks multiple graph convolution layers with the following function:

graphic file with name DmEquation1.gif (1)
graphic file with name DmEquation2.gif (2)
graphic file with name DmEquation3.gif (3)

where Inline graphic denotes the embedding of node i at the Inline graphic layer, Inline graphic defines the aggregator function, andInline graphic represents the neighbors of node Inline graphic. Inline graphic denotes the nonlinear activation function, and Inline graphic is the weight matrix at theInline graphic iteration. After applying the weighted aggregation, the information for each node is updated by incorporating features from its neighbors via graph edges. This process refines and imputes node representations.

The Feat_nodes were randomly split into a 7:3 ratio for training and validating sets, with the Hidden_nodes as the test set. The training set was used to optimize graph embeddings by integrating node attributes with importance scores derived from multiple ML methods and topological features. The root mean square error (RMSE) loss function was used to calculate the error between the model predictions and the actual values. Model parameters were optimized through supervised learning using a stochastic gradient descent algorithm. The trained model then predicted the importance scores for Hidden_nodes. The Clique Percolation Method (CPM), an unsupervised clustering algorithm, detected the network communities based on learned graph embedding. Hyperparameters of the DL model are summarized in Supplementary Table S1, while the MODA pseudocode and schematic diagram are provided in Algorithm 1 and Supplementary Fig. S2, respectively.

Algorithm 1: The overall process of MODA.

Input:

Inline graphic , Biological graph (nodes V, edges E).

Inline graphic : Node feature matrix.

K: Propagation layers (depth).

Inline graphic : Weight matrices for layer k.

Output:

Z: final node embeddings.

The community assignments: Community detection via CPM.

1: Initialize: node representations Inline graphic

2: Define:  Inline graphic (nonlinear activation), Inline graphic (differentiable aggregator), Inline graphic (neighborhood function)

3: for  k = 1,…, K  do

4:  for each Inline graphic  do

5: Inline graphic  # Aggregate neighbor features

6: Inline graphic # Update node embedding

7:  end for

8: Inline graphic

9: end for

10: Inline graphic

11: Perform community detection with CPM on Inline graphic

12: return Z, Community assignments

The score of each community was then calculated and ranked based on the importance scores of its nodes. Using these scores, MODA extracted the hub molecules. All hub molecules were divided into Feat_nodes (with experimental omics data) and Hidden_nodes (without experimental data).

Performance evaluation of multi-omics data integration analysis

We benchmarked MODA against several popular analysis pipelines and multi-omics integration methods using PRAD datasets. Two established R-based multi-omics integration methods, MOFA2 [20] (R package version 0.99.5) and mixOmics [21] (R package version 6.10.9), were included for comparison. Both methods were implemented strictly according to their official documentation, utilizing recommended default parameters. During analysis, MOFA2 could not process the multi-omics data of PRAD due to the high-dimensional features. Therefore, dimensionality reduction analysis was performed, and the top 1500 ranked genes were extracted based on ML-derived importance scores, while all metabolites were included due to their smaller quantity. In both methods, we analyzed the comparative influence of each omics type by examining their feature weights.

Additionally, three DL methods were selected to compare with MODA, including MOMA [22], MoGCN [23], and OmiEmbed [24]. MOMA applies geometric feature vectorization with an attention mechanism to prioritize important modules in multi-omics data. MoGCN [23] originally required three omics types as input, thus limiting its use. For this study’s PRAD data, we modified the original code to enable the integration of two-omics PRAD data.

Multi-omics data integration analysis generalization experiment

MODA was applied to transcriptomic data from 21 types of cancers in the TCGA database following standard operating procedures. We assessed the specificity of the hub molecules and pathways captured in different cancers by calculating the Jaccard Index (JI)

graphic file with name DmEquation7.gif

where C1 and C2 represent the hub molecules or crucial pathways of specific cancers, respectively.

Omics analysis

Detailed procedures for omics profiles in the two independent validation cohorts were provided below.

Metabolomics analysis

The metabolomics analysis and data processing were described previously [25, 26]. Briefly, 10 mg of wet tissue was homogenized with a porcelain bead and mixed with 300 μL of ice-cold methanol containing internal standards. After adding Methyl tert-butyl ether (MTBE), the mixtures were centrifuged to separate MTBE-rich and methanol/water-rich phases. An ACQUITYTM ultra-performance liquid chromatography (UPLC, Waters Corporation, Manchester, UK) with BEH C8 1.7 μm (2.1 × 100 mm) and HSS T3 1.8 μm (2.1 × 100 mm) columns was used for metabolite separation in both positive and negative ion modes. TOF MS full scan was performed using a TripleTOF 5600 Plus (Applied Biosystems, Foster City, CA) mass spectrometer coupled with a UPLC system.

Lipidomics analysis

As described previously [27], eight lipid internal standards were added. Samples were prepared and extracted from homogenized mixtures according to our group’s standardized process and then analyzed using a UPLC system coupled with a TripleTOF 5600 Plus for global lipidomics profiling.

mRNA profiling and miRNA profiles

Transcriptomics analysis was performed by RNA-seq as previously described [25, 28]. miRNA expression profiling had been detected as established protocols [25]. miRNA target genes were predicted using PITA (https://genie.weizmann.ac.il/pubs/mir07/mir07_data.html).

Cell experiment

Cell experiments were performed using 293 T and DU145 cell lines cultured under standard conditions. Detailed methods for lentiviral transduction (siRNA structure, Supplementary Fig. S3), Western blotting, CCK-8, and colony formation assays are provided in the Supplementary data 1.

Results

Multi-omics data integration analysis constructed disease-specific biological networks by integrating prior knowledge with multi-omics data

We constructed an initial biological knowledge graph comprising 660 072 edges and 8031 metabolites, 8073 genes, 8902 proteins, 4172 enzymes, and 2597 miRNAs. We applied MODA to a prostate cancer (PRAD) case study using multi-omics datasets from TCGA, including transcriptomics and miRNA expression data. In the absence of metabolomics data, metabolic fluxes were predicted using a genome-scale metabolic network (GSMN) model, leveraging transcriptomics data from the Recon3D database and encompassing 831 metabolites. Thus, three types of omics data (transcriptomics, miRNA, and metabolomics) were used for classification, providing complementary insights into PRAD stages. PCA analysis demonstrated a clear separation between ANTs and T2/T3 stages, with a subtler distinction between T2 and T3 (Supplementary Fig. S4).

We evaluated the importance scores for omics molecules using univariate analysis (Supplementary Tables S2S4), ML algorithms, and network topology indices (degree, closeness, betweenness, and eigenvector centrality). In total, 2319, 2872, and 164 DEGs were found in these comparisons, along with 350, 388, and 201 DEMi, and 107, 87, and 38 DEMe (P-FDR < .05, |log2FC| > 1 for genes and miRNAs, |log2FC| > 0.26 for metabolites). Forty-three key molecules (2 metabolites, 38 genes, and 3 miRNAs) showed consistent differential expression across multiple comparisons. These molecules were further validated via RF analysis (ntree = 200) with importance scores >1 (Supplementary Table S5). A classification model constructed using these molecules achieved a mean AUC of 0.778 (SD = 0.068) after five-fold cross-validation, and an AUC of 0.788 (95% CI: 0.638–0.916) in the validation set (Fig. 2A). Subsequently, a PRAD-specific network was constructed by identifying the two-hop neighbors of the 43 initially important molecules within the initial biological knowledge graph, after removing high-degree nodes (>100). The resulting network contained 473 enzymes, 117 genes, 986 metabolites, 5517 proteins, and 78 653 protein–protein interactions (Fig. 2B–D). After mapping the multi-omics dataset into the PRAD-specific network, nodes were categorized into 4280 Feat_nodes (with omics data mapping) and 2813 Hidden_nodes (without omics data mapping).

Figure 2.

Alt text: Composite figure with four panels (A–D) summarizing multi-omics and network information. (A) ROC curve with AUC = 0.788 and an inset bar plot showing proportion of metabolite, miRNA, and mRNA features. (B) Pie chart displaying percentage distribution of omics molecule categories in graph knowledge. (C) Pie chart illustrating proportions of different molecular interaction types in graph knowledge. (D) PRAD.

Construction of the PRAD-specific biological network through prior knowledge and multi-omics data. (A) the ROC analysis of the classification model built on 43 initially important molecules derived from an RF model from consistently differential omics molecules across multiple comparisons by univariate analysis and ML. The bar chart illustrates the proportion of different molecule types among the initially important molecules. The boxplot presents the cross-validation performance of the RF model. (B) Node composition and (C) interaction composition of the PRAD-specific biological network, with colors representing distinct node types or interaction relationships. (D) A schematic diagram of the PRAD-specific biological network. AUC, area under the curve.

Multi-omics data integration analysis effectively extracted crucial information from observed and unobserved molecules

MODA used a PRAD-specific network and an importance scores matrix as input to predict importance scores for Hidden_nodes, which had no prior training data, making MODA suitable for inductive learning tasks. The changing trend of the loss function (mean squared error) demonstrated good fitting during training and still exhibited excellent convergence results in the validation set (Fig. 3A). Finally, the trained MODA model was used to predict the importance scores of Hidden_nodes after improving graph embedding (Supplementary Table S6). Notably, the distribution of importance scores for Hidden_nodes closely resembled that of Feat_nodes, indicating no significant bias in the predictions (Fig. 3B). By combining the scores of Feat_nodes and Hidden_nodes, the ability of these molecules to distinguish between different stages of PRAD was further enhanced.

Figure 3.

Alt text: Composite figure with five panels (A–E) illustrating results of a DL analysis and function analysis. (A) Line plot of RMSE loss over iterations. (B) Density plot comparing importance scores of feature nodes and hidden nodes. (C) Bar plot showing score distribution across different community IDs. (D) Sankey diagram depicting relationships between protein/metabolite communities and broader functional pathways. (E) Dot plot (top) highlighting significance of pathways.

Training the MODA framework to identify hub molecules and disease-related modules. (A) The convergence of the loss function during DL, with training (circle) and validation (triangle). (B) Density plots of importance scores for Feat_nodes and Hidden_nodes. (C) Proportion and scoring of disease-related functional communities. The bar plot displays scores for different functional communities using distinct colors. The left pie chart shows the proportion of nodes within functional communities, while the right pie chart indicates the percentage of molecules within these communities. (D) Sankey plot illustrating the relationships between functional communities and associated biological processes. (E) Importance scores of significant pathways from enrichment analysis. The bubble size represents the pathway importance.

Compared with traditional pathway enrichment analysis, MODA framework effectively captured core modules and removed irrelevant information. As shown in Fig. 3C and Table 1, seven PRAD-related communities were extracted by CPM and ranked by community importance scores. Low-scoring communities were merged and defined as unrelated communities (including 98.87% of nodes). Molecules involved in these seven disease-related communities were summarized in Supplementary Table S7. The top 2 ranked communities were community_33 (scores: 5.704) and community_20 (scores: 5.458). The biological functions of seven important communities were further explored, revealing PRAD-specific modules. The enrichment analysis highlighted distinct metabolic patterns for each module. The communities_20 (Fig. 3D) was mainly associated with metabolism pathways, including lipid metabolism, glycan biosynthesis, and amino acid metabolism (Supplementary Table S8), while fatty acid degradation had the highest enrichment scores (Fig. 3E). Other PRAD-related modules, such as communities_71 and _150, were associated with energy metabolism, signal transduction, translation, and carbohydrate metabolism.

Table 1.

The information on the disease-related communities

Community Score Rank Number of nodes
Community_33 5.704 1 4
Community_20 5.458 2 4
Community_150 1.234 3 15
Community_213 1.177 4 16
Community_71 0.936 5 23
Community_354 0.429 6 6
Community_245 0.193 7 12
Unrelated community 7013

Unlike most multi-omics tools that select hub molecules based on statistical significance, MODA identified hub molecules by integrating both change trends and biological functions. MODA simultaneously considered both the node’s importance scores and whether it belongs to disease-related modules. Finally, we identified 58 hub nodes with higher scores within disease-related communities (Supplementary Table S7), including the Hidden_node of C00318 (carnitine), which had the highest score in community_20 (score: 4.172).

Multi-omics data integration analysis mined hidden information to enhance classification performance

To assess the classification performance of MODA, two additional batches of clinical samples from PRAD patients were analyzed in metabolomics, mRNA, and miRNA expression profiles. The baseline information of the two batch sets revealed no significant differences in Gleason grade and Prostate-Specific Antigen (PSA) levels between Batches 1 and 2 (Table 2, P > .05). Among the 58 disease-related molecules identified by MODA, 52 were detected across both datasets, including one Hidden_nodes (C00318, carnitine) (Supplementary Table S9). Differential analysis for different stages of PRAD was summarized in Supplementary Table S9. Among them, 20 molecules showed significant differences between one comparison, which suggested the feasibility and interpretability of MODA in selecting key molecules.

Table 2.

Baseline information on the two batch sets

Batch Group Stage Number Gleason PSA
Batch 1 ANT 21 7.65 ± 1.182 16.660 ± 14.804
Cancerous 19 7.53 ± 1.073 17.080 ± 15.091
Localized 13 7.23 ± 0.832 13.436 ± 7.915
Metastatic 6 8.17 ± 1.329 24.967 ± 23.670
Localized–metastatic Z value 0.14 0.335
P value .179 .368
Batch 2 ANT 50 7.38 ± 1.008 23.450 ± 18.815
Cancerous 50 7.40 ± 1.050 23.360 ± 18.870
Localized 31 7.35 ± 1.082 20.565 ± 16.186
Metastatic 19 7.47 ± 1.020 27.915 ± 22.301
Localized–metastatic Z value 0.529 1.369
P value .597 .171

To comprehensively evaluate classification performance, we benchmarked the 52 detectable key molecules identified by MODA against seven established methods in a three-class task distinguishing ANT, localized, and metastatic prostate cancer. The comparison included a traditional approach (RF), two R-based multi-omics integration tools (MOFA2 and mixOmics), and three DL-based models (MOMA, MoGCN, and OmiEmbed). Notably, OmiEmbed [29] operated as a black-box model incapable of feature attribution and therefore could not identify key disease-associated molecules. Therefore, it was excluded from comparisons. PSA levels were assessed as a clinical comparator. However, PSA only allowed binary classification of localized versus metastatic cases, since it is measured in blood and cannot differentiate cancerous tissue from ANT.

The MODA-derived panel showed superior performance compared with other methods (Batch 1 AUC = 0.923; Batch 2 AUC = 0.806, P < .05) (Fig. 4A), except for Batch 2 where the performance of MoGCN was not significantly different (P = .745). Additionally, MODA’s diagnostic model significantly outperformed traditional clinical indicators (PSA) in distinguishing patients at different stages (Batch 1: AUC, 0.641; Batch 2: AUC, 0.616, P < .05). As shown in Supplementary Table S10, in the ANT class, MODA consistently achieved perfect results across both batches (Batch 1: recall = 1.000, specificity = 1.000, F1-score = 1.000, PR-AUC = 1.000; Batch 2: recall = 1.000, F1-score = 0.952, PR-AUC = 0.983), demonstrating superior accuracy in distinguishing nontumor samples. For the localized class, which represents a more biologically heterogeneous group, MODA maintained balanced performance across metrics (Batch 1: F1_score = 0.818, specificity = 1.000, PR-AUC = 0.888; Batch 2: F1_score = 0.793, specificity = 0.942, PR-AUC = 0.778), outperforming or matching competing methods on key metrics such as specificity. In the metastatic class, MODA demonstrated strong sensitivity and precision (Batch 1: recall = 1.000, F1_score = 0.750, PR-AUC = 0.955; Batch 2: recall = 0.789, F1_score = 0.811, PR-AUC = 0.876), indicating robust capability for detecting advanced disease. Notably, MODA consistently outperformed other baseline methods in PR-AUC across batches, especially in the ANT and metastatic classes. Calibration curves also indicated that the MODA model demonstrated more stable performance compared with the other methods (Fig. 4B).

Figure 4.

Alt text: Composite figure with five panels (A–E) comparing performance of multiple computational methods in classification test. (A) ROC curves for differential methods across two batches. (B) Calibration plots for the same methods as in (A). (C) ROC curves comparing performance of feature groups: Feat, Hidden, and Feat + Hidden. (D) Calibration plots for the Feat, Hidden, and Feat + Hidden groups. (E) Dot plots showing prediction distributions for groups ANT, Localized, and Metastatic.

Classification performance of the hub molecules validated in the independent dataset. (A) Comparisons of classification performance between MODA and other methods in Batch 1 (left) and Batch 2 (right). (B) Calibration curves for the MODA and other models in Batch 1 (left) and Batch 2 (right). (C) Comparisons on the classification performance among Feat + Hidden, Feat, and Hidden hub molecule panels in Batch 1 (left) and Batch 2 (right). (D) Calibration curves of the Feat + Hidden, Feat, and Hidden panels in Batch 1 (left) and Batch 2 (right). (E) Bee swarm plots visualizing the predicted results from Batch 1 (left) and Batch 2 (right). Feat + Hidden panels include 52 hub molecules detectable in the independent dataset among the 58 hub molecules identified by MODA. The Hidden hub molecule refers to C00318, which was undetected in the discovery set (hidden_nodes) but is detectable in the independent validation set, enabling experimental confirmation of MODA prediction.

Moreover, we further validated the role of Hidden_nodes in classification tasks. The 52 detectable hub molecules identified by MODA were divided into three panels: Feat + Hidden (N = 52), Feat (N = 51), and Hidden (C00318) hub molecules. The Hidden panel alone demonstrated excellent classification performance in both Batch 1 (AUC, 0.926) and Batch 2 (AUC, 0.952) (Fig. 4C). The inclusion of Hidden_nodes in the diagnostic panel significantly improved the overall performance and stability (Fig. 4D), particularly in Batch 2 (Hidden versus Feat + Hidden, P = .008; Hidden versus Feat, P < .001). Bee plot visualization (Fig. 4E) also displayed improved separation between PRAD stages when Hidden_nodes are included. It highlighted the contribution of Hidden_nodes to enhancing the model’s diagnostic capability.

Multi-omics data integration analysis uncovered novel mechanisms linked to prostate cancer

We performed differential analysis of all molecules in the validation sets to assess the pathway disorders. DEMe, DEMi (P < .05, |log2FC| > 0.26), and DEGs (P < .05, |log2FC| > 1) were selected between cancerous and ANT, as well as between localized and metastatic stages. The detailed differential information was summarized in Supplementary Tables S11S13 (Batch 1, Supplementary data), as well as in Supplementary Tables S14S16 (Batch 2, Supplementary data). PCA score plot showed a clear separation of different PRAD stages (Supplementary Fig. S5A–S5F). The significant pathways were identified across both batches (Supplementary Fig. S6A). Notably, pathways related to cell growth and death, lipid metabolism, carbohydrate metabolism, and amino acid metabolism were confirmed, with “cellular processes” achieving the highest score (|log10p|). Lipid Metabolism and downstream correlation pathways still exhibited important roles in PRAD progression, consistent with MODA’s training predictions (Supplementary Fig. S6B). Further lipidomics analysis revealed statistically significant changes in four lipid classes-free fatty acids (FFAs), phosphatidylcholine (PC), phosphatidyl ethanolamine (PE), and triacylglyceride (TAG), throughout PRAD progression (Supplementary Fig. S7). We observed elevated FFA synthesis, particularly of long-chain fatty acids, while PC and PE levels increased in localized PRAD but decreased during the progression of PRAD. In contrast, TAGs showed the opposite trend.

To further investigate the mechanisms underlying key disease processes identified by MODA, in silico gene knockdown and experimental validation were employed on core disease-related functional modules. The shared significant molecules both in MODA and traditional methods were defined as Seed_nodes, including C00882 (dephospho-CoA), C02990 (palmitoylcarnitine), TROAP, C11orf87, and HOXB5 (Table 3 and Supplementary Fig. S8). Combined with Hidden_nodes (C00318, carnitine), six molecules were selected for further biological validation. After validation using an external dataset, TROAP, palmitoylcarnitine, and carnitine still exhibited significant trends in PRAD progression (Supplementary Table S9). Given their biological relevance, palmitoylcarnitine and carnitine were locked for further analysis.

Table 3.

The information of the seed_nodes in disease-related communities

Molecule Type Community Scores Feat_nodes Hidden_nodes
C00882 Metabolite Community_33 6.263
TROAP Protein Community_213 4.929
C02990 Metabolite Community_20 3.718
C11orf87 Protein Community_71 2.203
HOXB5 Protein Community_150 1.965
C00318 Metabolite Community_20 4.172

Leveraging a GSMN, 42 and 19 genes were identified as significantly influencing the metabolic flux of carnitine and palmitoylcarnitine, respectively (Fig. 5A). Nine genes (SETDB2, BBOX1, DOT1L, KMT5C, SETD1A, KMT2B, SETDB1, ALDH9A1, and SETD1B) were shared (Fig. 5B). Notably, DOT1L [30, 31], KMT5C [32], SETD1A [33], KMT2B [34], SETDB1 [30], ALDH9A1 [35], and SETD1B [36] had been previously reported to regulate PRAD progression. However, the role of BBOX1 in PRAD had not previously been explored. Using the COBRA Toolbox, metabolic dysfunction caused by silencing BBOX was predicted in the TCGA dataset and validated in DU145 cells (Fig. 5C). Results revealed that BBOX1 was involved in steroid hormone biosynthesis and fatty acid elongation.

Figure 5.

Alt text: Composite figure with eight panels (A–H) illustrating cell experiment results in prostate cancer cells. (A) Venn diagram showing the intersection of metabolite-related genes for C00318 and C02990. (B) Bar plot of −log₁₀(p) values for genes in (A). (C) Radial bar plot showing pathway changes after BBOX1 knockdown in TCGA dataset and Our dataset. (D) Colony formation assay images for cells in DMEM and DMEM + Camitine media. (E) Line plot of CCK8 for different treatment groups (D1C, D1W, etc.). (F) PCA scatter plots of metabolites expression modes in DU145_sh1 versus DU145_shNC cells. (G) Pie chart showing distribution of FFA subclasses. (H) Radial bar plot showing pathway changes.

MODA identifies and confirms novel molecular mechanism inducing prostate cancer progression. (A, B) The Venn diagram and bar plot of the key genes both regulated C00318 (left circle and below bar) and C02990 (right circle and upper bar). (C) Impact of BBOX1 knockout on biological pathways (left, TCGA data; right, our data). (D) 2D colony formation assay. DMEM, Dulbecco’s modified eagle medium. (E) CCK8 cell proliferation in transfected cells cultured in DMEM alone or supplemented with carnitine/palmitoylcarnitine. D1C, DU145_sh1 cultured in DMEM+carnitine; D1W, DU145_sh1 cultured in DMEM; D1P, DU145_sh1 cultured in DMEM + palmitoylcarnitine; DCC, DU145_shNC cultured in DMEM + carnitine; DCW, DU145_shNC cultured in DMEM; DCP, the DU145_shNC cultured in DMEM + palmitoylcarnitine. (F) PCA score plots (left, positive ion mode; right, negative ion mode). (G) The proportion of significantly differential lipids. (H) The enrichment analysis of differential molecules between DU145_sh1 and DU145_shNC.

In vitro experiments were conducted using BBOX1 knockdown DU145 cells (DU145_sh1) with control cell lines (DU145_shNC). Western blotting revealed a significant decrease in BBOX1 expression in knockdown cells compared with DU145_shNC (Supplementary Fig. S9A). Further assays (CCK-8 and colony formation) demonstrated that BBOX1 knockout inhibited DU145 cell proliferation and viability, with reduced growth rates observed at multiple time points (Fig. 5D and E, Supplementary Table S17). To investigate the functional role of BBOX1 in prostate cancer, we performed lipidomics analysis and found significant alterations in lipid profiles following BBOX1 knockdown. Notably, we found most lipids exhibited a significant difference (P < .05, |log2FC| > 0.263) (Fig. 5F). Ceramides (20.25%), PC (8.4%), and sphingomyelin (6.17%) were among the most affected lipid classes. Enrichment analysis revealed that BBOX1 knockout disrupted pathways consistent with findings from the validation set (Fig. 5G). Unsaturated fatty acid biosynthesis emerged as a key metabolic pathway in PRAD progression (Fig. 5H).

To determine whether the observed inhibition of cell proliferation was due to decreased levels of carnitine and palmitoylcarnitine, we conducted supplementation experiments. As illustrated in Fig. 5D and E, carnitine supplementation significantly improved the cell growth in both DU145_shNC and DU145_sh1, with the knockdown cells exhibiting a stronger response to supplementation (Supplementary Table S17). The plate cloning assay also supported the conclusion that carnitine supplementation increased cell growth rate. In contrast, palmitoylcarnitine supplementation did not affect cell growth. It suggested that carnitine metabolism was regulated by BBOX1 and played a pivotal role in PRAD progression. These findings validated MODA’s capability to uncover hidden disease mechanisms, identifying key molecular pathways involved in PRAD. BBOX1 was shown to regulate carnitine metabolism, affecting lipid biosynthesis and cell proliferation, providing new insights into PRAD biology and potential therapeutic targets.

Multi-omics data integration analysis demonstrated good generalization across diverse cancer datasets

We further demonstrated the generalizability of MODA on 21 cancer types in the TCGA transcriptomics datasets with varying sample sizes (ranging from 44 to 1231). The results revealed that disease-specific networks constructed for different cancers contain varying proportions of Hidden_Nodes, except for SKCM (skin cutaneous melanoma) and SARC (sarcoma) (Fig. 6A). The network information indicated that these two disease-specific networks had a smaller number of nodes, and concurrently, a lower average path length (Fig. 6B). The extraction of the specific networks within the MODA process relied on initially important molecules identified by RFs during the inter-group analysis. However, SKCM and SARC included an extremely low number of adjacent samples (Fig. 6A). This severe class imbalance might compromise ML training and consequently affect MODA’s ability to capture hidden information. By calculating the JI of key analyses identified by MODA across different cancers, it was evident that key molecules between different cancers exhibited high specificity (JI < 0.05) (Fig. 6C), and similar findings were also observed in the identification of crucial pathways (JI < 0.3) (Fig. 6D and E). Integrating the aforementioned results obtained from MODA, the cancer-pathways knowledge graph was constructed based on the 21 multiple cancer types (Supplementary Fig. S10).

Figure 6.

Alt text: Composite figure with five panels (A–E) presenting analyses of biological networks across TCGA cancer types. (A) Grouped bar plots showing percentages of Feature Nodes/Hidden Nodes (left) and proportions of ANT/Tumor samples (right) for various TCGA datasets. (B) Line and bar plot illustrating average path length in different cancer-specific biological network. (C) Histogram of JI distribution for hub nodes compared standard database. (D) Histogram of JI distribution for key pathways compared standard database. (D) Grouped bar plots comparing counts of enriched pathways from KEGG (red) and REACTOME (green) databases.

The assessment of the generalization performance of MODA across multiple cancers. (A) Overview of the disease-specific networks (left) and the sample labels (right). Network visualization: left, Hidden_nodes; right, Feat_nodes; left, tumor tissue; right, ANT. (B) The number of nodes and the average path length within each disease-specific network. (C) The JI of hub nodes obtained by MODA for each type of cancer. (D) The JI of the crucial pathways identified by MODA for each cancer. (E) The number of crucial pathways extracted from MODA across various cancer types. Red indicates pathways annotated by the KEGG (Kyoto encyclopedia of genes and genomes) database, and green signifies pathways annotated by the Reactome database.

Discussion

In this study, we introduce MODA, a DL-based framework designed to integrate multi-omics data for comprehensive analysis across various molecular levels. Our results demonstrate that MODA outperforms existing methods in biomarker discovery and elucidation of crucial disease mechanisms. Its accuracy and biological interpretability have been validated through population data and in vitro experiments.

Distinguished from other leading-edge methods that primarily concentrate on individual samples [37, 38], MODA employs a molecular network framework that integrates a variety of omics layers. By leveraging pre-existing experimental data with various ML methods, MODA enhances data reliability and effectively addresses challenges such as limited sample sizes and imbalanced labeling. MODA incorporates a GCN with a multihead attention to improve the robustness of molecular importance scoring, including for nodes without prior experimental measurements. Lastly, the CPM algorithm identifies central functional modules that govern disease progression by eliminating unrelated molecules. By integrating alteration trends with biological functions, MODA effectively identifies pivotal molecules, further substantiated by gene knockout simulations.

The application of MODA to prostate cancer (PRAD) highlights its capacity to identify hub molecules and disrupted pathways relevant to cancer progression. MODA effectively removes a large amount of irrelevant information and extracts key molecules with superior classification performance (Fig. 4A–C). Using RF, we extracted four Feat_nodes (TROAP, C11orf87, HOXB5, and palmitoylcarnitine) and one Hidden_node (carnitine) for further biological validation. Previous studies have associated palmitoylcarnitine [39], carnitine [40], and HOXB5 [41] with prostate cancer, supporting the biological relevance of our findings. Moreover, independent validation datasets further demonstrate that carnitine, palmitoylcarnitine, and TROPA could serve as potential biomarkers for PRAD screening (Fig. 4; Supplementary Table S9). Gene knockout experiment explores BBOX1 as a regulator of both carnitine and palmitoylcarnitine metabolism, affecting PRAD development and proliferation (Fig. 5). A key strength of our study is the complementary use of TCGA–PRAD for performance benchmarking and in-house metabolomics-inclusive cohorts for clinical and experimental validation, which is not feasible with TCGA datasets alone.

Compared with traditional pathway enrichment analysis, MODA identifies crucial functional modules from overlapping communities that encompass multiple omic layers. These modules are involved in various pathways and provide insight into mechanisms driving disease progression. For instance, apart from glycolysis, which serves as the primary energy source for cancer cells [42], MODA identified pathways associated with fatty acid oxidation (FAO), such as fatty acid degradation and lysine degradation, as critical metabolic dependencies in PRAD (Fig. 3D). This is mechanistically supported by the overexpression of CPT1A, the rate-limiting enzyme that conjugates carnitine to long-chain fatty acids (e.g. forming palmitoylcarnitine for mitochondrial import in aggressive tumors [43], where it correlates with poor prognosis [44]. Supplementary Table S8 further reveals multiple PRAD-related pathways linked to FAO dysregulation. The Wnt signaling pathway influences the cellular energy state and indirectly regulates FAO activity. Sphingolipid metabolism and lysine degradation also contribute to fatty acid metabolism and feed intermediates into the TCA cycle. Moreover, pathways such as inositol phosphate metabolism also modulate cellular energy balance.

Carnitine serves as a key molecule in lipid metabolism, mediating the β-oxidation of long-chain FFAs by transferring them into the mitochondria through carnitine translocase [45]. Lysine degradation provides the carbon skeletons for carnitine synthesis [45]. In vitro experiments also suggest that carnitine supplementation accelerates cell growth, implying that the carnitine system can serve as a key regulator of metabolic flexibility in cancer cells by regulating lipid metabolism [46]. Lipidomics analysis of tissue samples (Supplementary Fig. S7) shows a significant increase in long-chain FFAs. In contrast, short- and medium-chain FFAs levels remain unchanged, as they can enter mitochondria without the carnitine/acylcarnitine transport system [47]. The MODA framework identifies BBOX1 as a regulator of carnitine and palmitoylcarnitine metabolism through gene knockdown simulations. BBOX1 knockdown significantly decreases cell proliferation, highlighting its potential as a therapeutic target for modulating FAO [48].

Conclusion

MODA provides a powerful framework for advancing multi-omics research by constructing disease-specific knowledge graphs and applying GCN-based models for effective data integration, even with small sample sizes. By inferring the importance scores of undetectable molecules and delineating core functional modules associated with disease progression, MODA enhances the identification of critical disease mechanisms. This approach maximizes data utilization while reducing experimental costs, offering a valuable tool to support mechanistic insights and inform precision medicine strategies. As more metabolomics-inclusive cohorts become available, MODA can be further extended across diverse diseases to uncover novel metabolic dependencies and therapeutic targets.

Key Points

  • Multi-omics data integration analysis (MODA) integrates multi-omics data using a graph convolutional network framework with attention mechanisms to identify hub molecules and elucidate disease pathways, outperforming existing methods.

  • MODA constructs comprehensive knowledge-driven graphs and disease-specific networks, enhancing training efficiency and biological specificity.

  • MODA predicts metabolic flux from transcriptome data, expanding omics dimensions for a more comprehensive understanding of disease mechanisms.

  • MODA utilizes various machine learning algorithms to transform raw omics data into feature importance matrices, facilitating the integration of heterogeneous omics data and enabling effective training with limited sample sizes.

  • MODA employs a GCN-based overlapping community detection algorithm to identify biological functional modules, surpassing conventional pathway annotation constraints and capturing core molecular mechanisms with high precision.

Supplementary Material

Supporting_information_bbaf532
Supporting_information_bbaf532

Contributor Information

Jinhui Zhao, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.

Yanyan Zhou, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; Hepatobiliary Surgery Department, Dalian Medical University, No. 9, West Section of Lvshun South Road, Dalian 116044, P.R. China.

Han Bao, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.

Xinjie Zhao, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.

Xinxin Wang, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.

Chunxia Zhao, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.

Wangshu Qin, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.

Xin Lu, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.

Guowang Xu, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.

Author contributions

G.X. and X.L. supervised the project. J.Z. developed the DL framework and led benchmarks and case studies. Y.Z. and W.Q. conducted the biological validations and functional analyses. H.B. and X.W. contributed to the optimization of the model framework. X.Z. and C.Z. provided essential feedback and improvements for the methodological framework. J.Z. drafted the manuscript, which was revised by X.L. and G.X. All authors reviewed and approved the final manuscript.

Conflict of interest: None declared.

Funding

This research was supported by the National Natural Science Foundation of China (grant nos 22434006, 22274153, and 22274151), and the Innovation Program (grant no. DICP I202334) of Science and Research from the DICP, CAS, and the AI S&T Program (grant no. DNL-YL A202202) from Yulin Branch, Dalian National Laboratory for Clean Energy, CAS, China, and Liaoning Province International Science and Technology Cooperation Program Project (grant no. 2023JH2/10700023).

Data availability

Omics profilings from multiple cancer datasets were downloaded from the TCGA database (https://portal.gdc.cancer.gov/) by “TCGAbiolinks” R package. Clinical data, demographics, and other clinical features of these patients were extracted from the TCGA database. The multi-omics datasets of extra-prostate cancer were requested by the corresponding authors. The data are not publicly available due to privacy or ethical restrictions. All related scripts and code supporting this study are publicly available at https://github.com/zhaoxiaoqi0714/MODA.

Ethics approval and consent to participate

This study was conducted according to the ethical standards of the local institutional review board, and all participants provided written informed consent before their involvement.

References

  • 1. Nitsch  L, Lareau  CA, Ludwig  LS. Mitochondrial genetics through the lens of single-cell multi-omics. Nat Genet  2024;56:1355–65. 10.1038/s41588-024-01794-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Kernohan  KD, Boycott  KM. The expanding diagnostic toolbox for rare genetic diseases. Nat Rev Genet  2024;25:401–15. 10.1038/s41576-023-00683-w [DOI] [PubMed] [Google Scholar]
  • 3. Abedini  A, Levinsohn  J, Klötzer  KA. et al.  Single-cell multi-omic and spatial profiling of human kidneys implicates the fibrotic microenvironment in kidney disease progression. Nat Genet  2024;56:1712–24. 10.1038/s41588-024-01802-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. He  X, Liu  X, Zuo  F. et al.  Artificial intelligence-based multi-omics analysis fuels cancer precision medicine. Semin Cancer Biol  2023;88:187–200. 10.1016/j.semcancer.2022.12.009 [DOI] [PubMed] [Google Scholar]
  • 5. Huttlin  EL, Bruckner  RJ, Navarrete-Perea  J. et al.  Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell  2021;184:3022–3040.e28. 10.1016/j.cell.2021.04.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Karayel  O, Virreira Winter  S, Padmanabhan  S. et al.  Proteome profiling of cerebrospinal fluid reveals biomarker candidates for Parkinson’s disease. Cell Rep Med  2022;3:100661. 10.1016/j.xcrm.2022.100661 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Wishart  DS. Metabolomics for investigating physiological and pathophysiological processes. Physiol Rev  2019;99:1819–75. 10.1152/physrev.00035.2018 [DOI] [PubMed] [Google Scholar]
  • 8. Zhou  G, Li  S, Xia  J. Network-based approaches for multi-omics integration. Methods Mol Biol  2020;2104:469–87. 10.1007/978-1-0716-0239-3_23 [DOI] [PubMed] [Google Scholar]
  • 9. Chen  W, Zhang  P, Zhang  X. et al.  Machine learning-causal inference based on multi-omics data reveals the association of altered gut bacteria and bile acid metabolism with neonatal jaundice. Gut Microbes  2024;16:2388805. 10.1080/19490976.2024.2388805 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Wang  K, Abid  MA, Rasheed  A. et al.  DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. Mol Plant  2023;16:279–93. 10.1016/j.molp.2022.11.004 [DOI] [PubMed] [Google Scholar]
  • 11. Forrest  IS, Petrazzini  BO, Duffy  Á. et al.  Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. Lancet  2023;401:215–25. 10.1016/S0140-6736(22)02079-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Chi  C, Ye  Y, Chen  B. et al.  Bipartite graph-based approach for clustering of cell lines by gene expression–drug response associations. Bioinformatics  2021;37:2617–26. 10.1093/bioinformatics/btab143 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Theodoris  CV, Xiao  L, Chopra  A. et al.  Transfer learning enables predictions in network biology. Nature  2023;618:616–24. 10.1038/s41586-023-06139-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Xuan  P, Gong  Z, Cui  H. et al.  Fully connected autoencoder and convolutional neural network with attention-based method for inferring disease-related lncRNAs. Brief Bioinform  2022;23:bbac089. 10.1093/bib/bbac089 [DOI] [PubMed] [Google Scholar]
  • 15. Lin  L, Xiong  M, Zhang  G. et al.  A convolutional neural network and graph convolutional network based framework for AD classification. Sensors (Basel)  2023;23:1914. 10.3390/s23041914 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Zhang  Q, Wei  Y, Han  Z., et al.  Multimodal fusion on low-quality data: a comprehensive survey. arXiv, 10.48550/arXiv.2404.18947, 1 November 2024, preprint: not peer reviewed. [DOI] [Google Scholar]
  • 17. Novakovsky  G, Dexter  N, Libbrecht  MW. et al.  Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet  2023;24:125–37. 10.1038/s41576-022-00532-2 [DOI] [PubMed] [Google Scholar]
  • 18. Lewis  JE, Forshaw  TE, Boothman  DA. et al.  Personalized genome-scale metabolic models identify targets of redox metabolism in radiation-resistant tumors. Cell Syst  2021;12:68–81.e11. 10.1016/j.cels.2020.12.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Heirendt  L, Arreckx  S, Pfau  T. et al.  Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat Protoc  2019;14:639–702. 10.1038/s41596-018-0098-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Argelaguet  R, Arnol  D, Bredikhin  D. et al.  MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol  2020;21:111. 10.1186/s13059-020-02015-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Rohart  F, Gautier  B, Singh  A. et al.  mixOmics: an R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol  2017;13:e1005752. 10.1371/journal.pcbi.1005752 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Moon  S, Lee  H. MOMA: a multi-task attention learning algorithm for multi-omics data interpretation and classification. Bioinformatics  2022;38:2287–96. 10.1093/bioinformatics/btac080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Li  X, Ma  J, Leng  L. et al.  MoGCN: a multi-omics integration method based on graph convolutional network for cancer subtype analysis. Front Genet  2022;13:806842. 10.3389/fgene.2022.806842 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. OmiEmbed : A Unified Multi-Task Deep Learning Framework for Multi-Omics Data - PubMed. https://pubmed.ncbi.nlm.nih.gov/34207255/. (17 December 2024, date last accessed). [DOI] [PMC free article] [PubMed]
  • 25. Ren  S, Shao  Y, Zhao  X. et al.  Integration of metabolomics and transcriptomics reveals major metabolic pathways and potential biomarker involved in prostate cancer. Mol Cell Proteomics  2016;15:154–63. 10.1074/mcp.M115.052381 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Chen  S, Hoene  M, Li  J. et al.  Simultaneous extraction of metabolome and lipidome with methyl tert-butyl ether from a single small tissue sample for ultra-high performance liquid chromatography/mass spectrometry. J Chromatogr A  2013;1298:9–16. 10.1016/j.chroma.2013.05.019 [DOI] [PubMed] [Google Scholar]
  • 27. Li  J, Ren  S, Piao  HL. et al.  Integration of lipidomics and transcriptomics unravels aberrant lipid metabolism and defines cholesteryl oleate as potential biomarker of prostate cancer. Sci Rep  2016;6:20984. 10.1038/srep20984 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Ren  S, Peng  Z, Mao  JH. et al.  RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant alternative splicings. Cell Res  2012;22:806–21. 10.1038/cr.2012.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Zhang  X, Xing  Y, Sun  K. et al.  OmiEmbed: a unified multi-task deep learning framework for multi-omics data. Cancers (Basel)  2021;13:3047. 10.3390/cancers13123047 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Vatapalli  R, Sagar  V, Rodriguez  Y. et al.  Histone methyltransferase DOT1L coordinates AR and MYC stability in prostate cancer. Nat Commun  2020;11:4153. 10.1038/s41467-020-18013-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Thomas  T. DOT1L in prostate cancer. Nat Rev Urol  2020;17:544. 10.1038/s41585-020-0374-0 [DOI] [PubMed] [Google Scholar]
  • 32. Quan  Y, Zhang  X, Wang  M. et al.  Histone lysine methylation patterns in prostate cancer microenvironment infiltration: integrated bioinformatic analysis and histological validation. Front Oncol  2022;12:981226. 10.3389/fonc.2022.981226 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Yang  L, Jin  M, Park  SJ. et al.  SETD1A promotes proliferation of castration-resistant prostate cancer cells via FOXM1 transcription. Cancers (Basel)  2020;12:1736. 10.3390/cancers12071736 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Zhang  J, Ding  X, Peng  K. et al.  Identification of biomarkers for immunotherapy response in prostate cancer and potential drugs to alleviate immunosuppression. Aging (Albany NY)  2022;14:4839–57. 10.18632/aging.204115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Bova  GS, Kallio  HML, Annala  M. et al.  Integrated clinical, whole-genome, and transcriptome analysis of multisampled lethal metastatic prostate cancer. Cold Spring Harb Mol Case Stud  2016;2:a000752. 10.1101/mcs.a000752 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Craddock  J, Jiang  J, Patrick  SM. et al.  Alterations in the epigenetic machinery associated with prostate cancer health disparities. Cancers (Basel)  2023;15:3462. 10.3390/cancers15133462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Steyaert  S, Pizurica  M, Nagaraj  D. et al.  Multimodal data fusion for cancer biomarker discovery with deep learning. Nat Mach Intell  2023;5:351–62. 10.1038/s42256-023-00633-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Ding  S, Li  J, Wang  J. et al.  Multimodal co-attention fusion network with online data augmentation for cancer subtype classification. IEEE Trans Med Imaging  2024;43. 10.1109/TMI.2024.3405535 [DOI] [PubMed] [Google Scholar]
  • 39. Al-Bakheit  A, Traka  M, Saha  S. et al.  Accumulation of palmitoylcarnitine and its effect on pro-inflammatory pathways and calcium influx in prostate cancer. Prostate  2016;76:1326–37. 10.1002/pros.23222 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Triscott  J, Lehner  M, Benjak  A. et al.  Loss of PI5P4Kα slows the progression of a Pten mutant basal cell model of prostate cancer. Mol Cancer Res  2025;23:33–45. 10.1158/1541-7786.MCR-24-0290 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Sekino  Y, Pham  QT, Kobatake  K. et al.  HOXB5 overexpression is associated with neuroendocrine differentiation and poor prognosis in prostate cancer. Biomedicines  2021;9:893. 10.3390/biomedicines9080893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Tan  Y, Li  J, Zhao  G. et al.  Metabolic reprogramming from glycolysis to fatty acid uptake and beta-oxidation in platinum-resistant cancer cells. Nat Commun  2022;13:4554. 10.1038/s41467-022-32101-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Ma  L, Chen  C, Zhao  C. et al.  Targeting carnitine palmitoyl transferase 1A (CPT1A) induces ferroptosis and synergizes with immunotherapy in lung cancer. Signal Transduct Target Ther  2024;9:64. 10.1038/s41392-024-01772-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Li  R, Li  X, Zhao  J. et al.  Mitochondrial STAT3 exacerbates LPS-induced sepsis by driving CPT1a-mediated fatty acid oxidation. Theranostics  2022;12:976–98. 10.7150/thno.63751 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Meng  Y, Guo  D, Lin  L. et al.  Glycolytic enzyme PFKL governs lipolysis by promoting lipid droplet-mitochondria tethering to enhance β-oxidation and tumor cell proliferation. Nat Metab  2024;6:1092–107. 10.1038/s42255-024-01047-2 [DOI] [PubMed] [Google Scholar]
  • 46. Melone  MAB, Valentino  A, Margarucci  S. et al.  The carnitine system and cancer metabolic plasticity. Cell Death Dis  2018;9:228. 10.1038/s41419-018-0313-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Nicholas  DA, Proctor  EA, Agrawal  M. et al.  Fatty acid metabolites combine with reduced β oxidation to activate Th17 inflammation in human type 2 diabetes. Cell Metab  2019;30:447–461.e5. 10.1016/j.cmet.2019.07.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Wang  J, Zhou  Y, Zhang  D. et al.  CRIP1 suppresses BBOX1-mediated carnitine metabolism to promote stemness in hepatocellular carcinoma. EMBO J  2022;41:e110218. 10.15252/embj.2021110218 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting_information_bbaf532
Supporting_information_bbaf532

Data Availability Statement

Omics profilings from multiple cancer datasets were downloaded from the TCGA database (https://portal.gdc.cancer.gov/) by “TCGAbiolinks” R package. Clinical data, demographics, and other clinical features of these patients were extracted from the TCGA database. The multi-omics datasets of extra-prostate cancer were requested by the corresponding authors. The data are not publicly available due to privacy or ethical restrictions. All related scripts and code supporting this study are publicly available at https://github.com/zhaoxiaoqi0714/MODA.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES