Abstract
Advances in omics technologies provide unprecedented opportunities for systems biology, yet integrating multi-omics data remains challenging due to its complexity, heterogeneity, and the sparsity of prior knowledge networks. Here, we introduce a multi-omics data integration analysis (MODA) framework that fully incorporates prior knowledge to identify hub molecules and pathways, and elucidate biological mechanisms. By leveraging multiple machine learning approaches, MODA transforms raw omics data into a feature importance matrix that is mapped onto a biological knowledge graph to mitigate omics data noise. Then, it uses graph convolutional networks with attention mechanisms to capture intricate molecular relationships and rank molecules via a feature-selective layer. Ultimately, MODA transcends the limitations of predefined pathway annotations by employing an overlapping community detection algorithm to extract core functional modules that are involved in multiple pivotal disease pathways. Systematic evaluations show that MODA outperforms seven existing multi-omics integration methods in classification performance while maintaining biological interpretability. Moreover, MODA achieves superior stability in pan-cancer datasets. Application to the multi-omics datasets of prostate cancer reveals a key role for carnitine and palmitoylcarnitine, regulated by BBOX1 in the progression of prostate cancer. Population samples and in vitro experiments further validate these findings. With high data utilization efficiency and low computational cost, MODA serves as a robust tool for uncovering novel disease mechanisms and advancing precision medicine.
Keywords: multi-omics integration; deep learning, graph convolutional networks; biological knowledge graph; prostate cancer
Introduction
Advancements in analytical technologies have significantly revolutionized various omics methods, including genomics, transcriptomics, proteomics, and metabolomics. These developments have generated vast datasets that contribute invaluable insights into the molecular basis of biological processes, improving our understanding of phenotype variation [1], genetic diseases [2], disease diagnosis, and prognosis [3], as well as the role of proteins in cellular functions and cancer therapies [4], and medical research [5, 6]. However, despite the valuable insights offered by individual omics, each omics approach provides only a partial view of the complex molecular regulatory networks [4]. It is necessary to integrate multi-omics data to achieve a more comprehensive understanding of life processes.
Metabolomics reflects both endogenous metabolic pathways and external factors such as diet, drugs, toxins, and lifestyle choices. It plays a crucial role in bridging the gap between genotypes and phenotypes, providing unique insights into the complex gene–environment interactions [7]. Despite its growing significance, the integration of metabolomics with other omics data remains challenging due to the unique complexities of metabolomics data, including high dimensionality, variability, data sparsity, and missing values. These challenges are often inadequately addressed by conventional multi-omics integration approaches, which tend to focus on statistical correlations, network-based associations, or machine learning (ML) models that may not fully capture the nonlinear and context-dependent nature of biological systems.
The prevailing approaches for multi-omics data integration typically encompass three categories: statistical integration, network-based integration [8], and ML methods [9]. Statistical methods aim to identify shared patterns across datasets but often overlook the complex and nonlinear relationships inherent in biological data. Network-based approaches [10], though valuable for mapping molecular interactions, can oversimplify the complex, dynamic, and multifaceted nature of omics data integration. ML, especially ensemble models like random forests (RFs) or gradient boosting, have demonstrated success in biomarker identification and classification tasks, benefiting from feature selection and robustness to noise [11]. However, these models still rely on handcrafted features or shallow representations, limiting their capacity to model complex biological processes. In contrast, deep learning (DL) has shown promise in uncovering hidden patterns from high-dimensional omics data. From a training perspective, popular ensemble/ML methods (e.g. RFs) can become memory-intensive when applied to large-scale datasets [12]. As a rapidly advancing branch of ML, pretrained DL models enable transfer learning across related biological tasks, whereas ensemble models typically require retraining from scratch [13]. In particular, given the inherent properties of biological networks, graph-structured DL frameworks have been successfully applied to infer biological mechanisms [14] and assist in disease diagnosis [15]. Nevertheless, biological data, especially in multi-omics integration, often exhibit high heterogeneity and noise, complicating effective feature extraction [16], increasing the risk of overfitting, and limiting interpretability [17]. Therefore, selecting appropriate input features and strategically combining the strengths of ML and DL presents a promising direction for advancing multi-omics data integration.
To address these challenges, we introduce MODA, a multi-omics data integration analysis framework that leverages graph convolutional networks (GCNs) with attention mechanisms and prior knowledge (Fig. 1). MODA is specifically designed to enhance metabolomics integration with other omics data. MODA facilitates the discovery of hub molecules and pathways in disease research. MODA enables more accurate and interpretable integration of multi-omics data. By utilizing GCNs, MODA captures both omics-specific features and the intricate relationships among molecules, providing a deeper understanding of disease mechanisms. We demonstrate the efficacy of MODA in uncovering novel hub molecules and pathways across different stages of prostate cancer (PRAD), which is supported by both population-based and in vitro validation. Our findings underscore the potential of MODA as a robust tool for advancing precision medicine and enhancing our understanding of complex biological systems.
Figure 1.
Workflow of the GCN-based MODA framework for unraveling hub molecules and disease-specific modules. The framework consists of two main parts: disease-specific network construction and information mining. Disease-specific network construction involves the integration of data from multiple open-access databases to generate a biological interaction knowledge graph. In the absence of metabolomics data, metabolic fluxes are predicted from transcriptomics profiles. Disease-specific multi-omics experimental data are collected and transformed into an attribution matrix representing node importance scores through ML and network topology analyses. Using the experimental data, a disease-specific biological network is extracted. This matrix is mapped onto the biological graph to facilitate information mining. The DL framework processes the disease-specific biological network to prioritize key molecules and core modules. VIP, variable importance in projection; CPM, clique percolation method. Feat_nodes, nodes with experimentally measured omics data; Hidden_nodes, nodes without direct experimental measurements; predicted nodes, nodes with importance scores predicted by MODA.
Materials and methods
Omics data collection
Performance evaluation
We employed a stepwise strategy integrating public and in-house omics datasets to evaluate MODA’s performance. The mRNA expression profiles (Fragments Per Kilobase of transcript per Million mapped fragments normalization, FPKM), miRNA expression profiles, and phenotype data of prostate cancer (PRAD) were downloaded from The Cancer Genome Atlas database (TCGA, https://portal.gdc.cancer.gov/) by the “TCGAbiolinks” R package. The samples were divided into cancerous tissues or adjacent normal tissues (ANTs) based on the sample labels (Cancerous tissues, 01A; ANTs, 11A). Cancerous tissues were divided into T2 and T3 stages based on the Tumor, Node, Metastasis (TNM) classification system. Since TCGA–PRAD does not provide metabolomics, metabolic fluxes for PRAD datasets were predicted using mRNA profiles by the bioinformatics pipeline developed by Lewis [18]. Code execution was performed in PyCharm (Professional Version 2022.2.3, JetBrains, Prague, Czech Republic). The COBRA Toolbox [19] was applied to simulate genome-wide knockouts and infer the gene functions.
To validate the key molecules and pathways identified in TCGA–PRAD, two independent datasets were collected from PRAD patients who underwent radical prostatectomy at Shanghai Changhai Hospital. These datasets uniquely include metabolomics, lipidomics, transcriptomics, and miRNA profiles with comprehensive clinical annotations. Batch 1 included 21 pairs of cancerous and matched ANTs, while Batch 2 included 50 cancerous tissues and 50 ANTs (49 paired). All experimental protocols were approved by the institutional review board, and informed consent was obtained from all subjects.
Generalization verification
To further evaluate the generalizability of MODA beyond prostate cancer, we obtained the transcriptomics profiles across 21 cancer types from NCBI using the TCGA portal (https://portal.gdc.cancer.gov/) for assessing the generalizability of MODA model.
Multi-omics data integration analysis computational framework
Construction of disease-specific biological network
A disease-specific biological knowledge graph was assembled from multiple curated databases, including KEGG (https://www.genome.jp/kegg/), HMDB (https://hmdb.ca/), BRENDA (https://www.brenda-enzymes.org/), STRING (https://string-db.org/), iRefIndex (https://irefindex.vib.be/), HuRi (http://www.interactome-atlas.org/), TRRUST (http://www.grnpedia.org/trrust), and OmniPath (“OmniPath” R package). Interactions among metabolites, genes, enzymes, and miRNA were standardized and deduplicated to generate a unified undirected graph.
To generate initial feature representations, we applied four complementary ML and statistical methods: t-tests (p-FDR), fold change (FC), RF, LASSO, and Partial Least Squares Discriminant Analysis. These methods produced feature-level importance scores that were normalized and integrated into a unified attribute matrix. This matrix reflects the contribution of each molecule to disease classification. Significant molecules derived from diverse omics types were mapped into the biological network as seed nodes. A k-step neighborhood subgraph was constructed by expanding from the seed nodes. Based on the evaluation of k values ranging from 1 to 5 (Supplementary Fig. S1), k = 2 was selected to balance network coverage and maintain ~1:1 ratio between Feat_nodes (nodes with experimental measurements) and hidden_nodes (nodes without direct experimental measurements). Hidden_nodes represent molecules presented in the knowledge graph without direct experimental data in the study dataset. These nodes capture unmeasured but biologically relevant components. Including these nodes allows the model to propagate information across the network, improving system-level inference. The constructed subgraph served as the input for the graph learning module.
Graph representation learning and score prediction
Given the constructed graph
, where
includes Feat_nodes (
) and Hidden_nodes (
), E is the set of undirected edges,
represents the feature matrix derived from multi-omics importance scores, and Y is the class label (available only for Feat_nodes). MODA applies a two-layer GCN to propagate and refine node attributes. The GCN updates node attributes by the aggregation function and stacks multiple graph convolution layers with the following function:
![]() |
(1) |
![]() |
(2) |
![]() |
(3) |
where
denotes the embedding of node i at the
layer,
defines the aggregator function, and
represents the neighbors of node
.
denotes the nonlinear activation function, and
is the weight matrix at the
iteration. After applying the weighted aggregation, the information for each node is updated by incorporating features from its neighbors via graph edges. This process refines and imputes node representations.
The Feat_nodes were randomly split into a 7:3 ratio for training and validating sets, with the Hidden_nodes as the test set. The training set was used to optimize graph embeddings by integrating node attributes with importance scores derived from multiple ML methods and topological features. The root mean square error (RMSE) loss function was used to calculate the error between the model predictions and the actual values. Model parameters were optimized through supervised learning using a stochastic gradient descent algorithm. The trained model then predicted the importance scores for Hidden_nodes. The Clique Percolation Method (CPM), an unsupervised clustering algorithm, detected the network communities based on learned graph embedding. Hyperparameters of the DL model are summarized in Supplementary Table S1, while the MODA pseudocode and schematic diagram are provided in Algorithm 1 and Supplementary Fig. S2, respectively.
Algorithm 1: The overall process of MODA.
Input:
, Biological graph (nodes V, edges E).
: Node feature matrix.
K: Propagation layers (depth).
: Weight matrices for layer k.
Output:
Z: final node embeddings.
The community assignments: Community detection via CPM.
1: Initialize: node representations 
2: Define:
(nonlinear activation),
(differentiable aggregator),
(neighborhood function)
3: for k = 1,…, K do
4: for each
do
5:
# Aggregate neighbor features
6:
# Update node embedding
7: end for
8: 
9: end for
10: 
11: Perform community detection with CPM on 
12: return Z, Community assignments
The score of each community was then calculated and ranked based on the importance scores of its nodes. Using these scores, MODA extracted the hub molecules. All hub molecules were divided into Feat_nodes (with experimental omics data) and Hidden_nodes (without experimental data).
Performance evaluation of multi-omics data integration analysis
We benchmarked MODA against several popular analysis pipelines and multi-omics integration methods using PRAD datasets. Two established R-based multi-omics integration methods, MOFA2 [20] (R package version 0.99.5) and mixOmics [21] (R package version 6.10.9), were included for comparison. Both methods were implemented strictly according to their official documentation, utilizing recommended default parameters. During analysis, MOFA2 could not process the multi-omics data of PRAD due to the high-dimensional features. Therefore, dimensionality reduction analysis was performed, and the top 1500 ranked genes were extracted based on ML-derived importance scores, while all metabolites were included due to their smaller quantity. In both methods, we analyzed the comparative influence of each omics type by examining their feature weights.
Additionally, three DL methods were selected to compare with MODA, including MOMA [22], MoGCN [23], and OmiEmbed [24]. MOMA applies geometric feature vectorization with an attention mechanism to prioritize important modules in multi-omics data. MoGCN [23] originally required three omics types as input, thus limiting its use. For this study’s PRAD data, we modified the original code to enable the integration of two-omics PRAD data.
Multi-omics data integration analysis generalization experiment
MODA was applied to transcriptomic data from 21 types of cancers in the TCGA database following standard operating procedures. We assessed the specificity of the hub molecules and pathways captured in different cancers by calculating the Jaccard Index (JI)
![]() |
where C1 and C2 represent the hub molecules or crucial pathways of specific cancers, respectively.
Omics analysis
Detailed procedures for omics profiles in the two independent validation cohorts were provided below.
Metabolomics analysis
The metabolomics analysis and data processing were described previously [25, 26]. Briefly, 10 mg of wet tissue was homogenized with a porcelain bead and mixed with 300 μL of ice-cold methanol containing internal standards. After adding Methyl tert-butyl ether (MTBE), the mixtures were centrifuged to separate MTBE-rich and methanol/water-rich phases. An ACQUITYTM ultra-performance liquid chromatography (UPLC, Waters Corporation, Manchester, UK) with BEH C8 1.7 μm (2.1 × 100 mm) and HSS T3 1.8 μm (2.1 × 100 mm) columns was used for metabolite separation in both positive and negative ion modes. TOF MS full scan was performed using a TripleTOF 5600 Plus (Applied Biosystems, Foster City, CA) mass spectrometer coupled with a UPLC system.
Lipidomics analysis
As described previously [27], eight lipid internal standards were added. Samples were prepared and extracted from homogenized mixtures according to our group’s standardized process and then analyzed using a UPLC system coupled with a TripleTOF 5600 Plus for global lipidomics profiling.
mRNA profiling and miRNA profiles
Transcriptomics analysis was performed by RNA-seq as previously described [25, 28]. miRNA expression profiling had been detected as established protocols [25]. miRNA target genes were predicted using PITA (https://genie.weizmann.ac.il/pubs/mir07/mir07_data.html).
Cell experiment
Cell experiments were performed using 293 T and DU145 cell lines cultured under standard conditions. Detailed methods for lentiviral transduction (siRNA structure, Supplementary Fig. S3), Western blotting, CCK-8, and colony formation assays are provided in the Supplementary data 1.
Results
Multi-omics data integration analysis constructed disease-specific biological networks by integrating prior knowledge with multi-omics data
We constructed an initial biological knowledge graph comprising 660 072 edges and 8031 metabolites, 8073 genes, 8902 proteins, 4172 enzymes, and 2597 miRNAs. We applied MODA to a prostate cancer (PRAD) case study using multi-omics datasets from TCGA, including transcriptomics and miRNA expression data. In the absence of metabolomics data, metabolic fluxes were predicted using a genome-scale metabolic network (GSMN) model, leveraging transcriptomics data from the Recon3D database and encompassing 831 metabolites. Thus, three types of omics data (transcriptomics, miRNA, and metabolomics) were used for classification, providing complementary insights into PRAD stages. PCA analysis demonstrated a clear separation between ANTs and T2/T3 stages, with a subtler distinction between T2 and T3 (Supplementary Fig. S4).
We evaluated the importance scores for omics molecules using univariate analysis (Supplementary Tables S2–S4), ML algorithms, and network topology indices (degree, closeness, betweenness, and eigenvector centrality). In total, 2319, 2872, and 164 DEGs were found in these comparisons, along with 350, 388, and 201 DEMi, and 107, 87, and 38 DEMe (P-FDR < .05, |log2FC| > 1 for genes and miRNAs, |log2FC| > 0.26 for metabolites). Forty-three key molecules (2 metabolites, 38 genes, and 3 miRNAs) showed consistent differential expression across multiple comparisons. These molecules were further validated via RF analysis (ntree = 200) with importance scores >1 (Supplementary Table S5). A classification model constructed using these molecules achieved a mean AUC of 0.778 (SD = 0.068) after five-fold cross-validation, and an AUC of 0.788 (95% CI: 0.638–0.916) in the validation set (Fig. 2A). Subsequently, a PRAD-specific network was constructed by identifying the two-hop neighbors of the 43 initially important molecules within the initial biological knowledge graph, after removing high-degree nodes (>100). The resulting network contained 473 enzymes, 117 genes, 986 metabolites, 5517 proteins, and 78 653 protein–protein interactions (Fig. 2B–D). After mapping the multi-omics dataset into the PRAD-specific network, nodes were categorized into 4280 Feat_nodes (with omics data mapping) and 2813 Hidden_nodes (without omics data mapping).
Figure 2.
Construction of the PRAD-specific biological network through prior knowledge and multi-omics data. (A) the ROC analysis of the classification model built on 43 initially important molecules derived from an RF model from consistently differential omics molecules across multiple comparisons by univariate analysis and ML. The bar chart illustrates the proportion of different molecule types among the initially important molecules. The boxplot presents the cross-validation performance of the RF model. (B) Node composition and (C) interaction composition of the PRAD-specific biological network, with colors representing distinct node types or interaction relationships. (D) A schematic diagram of the PRAD-specific biological network. AUC, area under the curve.
Multi-omics data integration analysis effectively extracted crucial information from observed and unobserved molecules
MODA used a PRAD-specific network and an importance scores matrix as input to predict importance scores for Hidden_nodes, which had no prior training data, making MODA suitable for inductive learning tasks. The changing trend of the loss function (mean squared error) demonstrated good fitting during training and still exhibited excellent convergence results in the validation set (Fig. 3A). Finally, the trained MODA model was used to predict the importance scores of Hidden_nodes after improving graph embedding (Supplementary Table S6). Notably, the distribution of importance scores for Hidden_nodes closely resembled that of Feat_nodes, indicating no significant bias in the predictions (Fig. 3B). By combining the scores of Feat_nodes and Hidden_nodes, the ability of these molecules to distinguish between different stages of PRAD was further enhanced.
Figure 3.
Training the MODA framework to identify hub molecules and disease-related modules. (A) The convergence of the loss function during DL, with training (circle) and validation (triangle). (B) Density plots of importance scores for Feat_nodes and Hidden_nodes. (C) Proportion and scoring of disease-related functional communities. The bar plot displays scores for different functional communities using distinct colors. The left pie chart shows the proportion of nodes within functional communities, while the right pie chart indicates the percentage of molecules within these communities. (D) Sankey plot illustrating the relationships between functional communities and associated biological processes. (E) Importance scores of significant pathways from enrichment analysis. The bubble size represents the pathway importance.
Compared with traditional pathway enrichment analysis, MODA framework effectively captured core modules and removed irrelevant information. As shown in Fig. 3C and Table 1, seven PRAD-related communities were extracted by CPM and ranked by community importance scores. Low-scoring communities were merged and defined as unrelated communities (including 98.87% of nodes). Molecules involved in these seven disease-related communities were summarized in Supplementary Table S7. The top 2 ranked communities were community_33 (scores: 5.704) and community_20 (scores: 5.458). The biological functions of seven important communities were further explored, revealing PRAD-specific modules. The enrichment analysis highlighted distinct metabolic patterns for each module. The communities_20 (Fig. 3D) was mainly associated with metabolism pathways, including lipid metabolism, glycan biosynthesis, and amino acid metabolism (Supplementary Table S8), while fatty acid degradation had the highest enrichment scores (Fig. 3E). Other PRAD-related modules, such as communities_71 and _150, were associated with energy metabolism, signal transduction, translation, and carbohydrate metabolism.
Table 1.
The information on the disease-related communities
| Community | Score | Rank | Number of nodes |
|---|---|---|---|
| Community_33 | 5.704 | 1 | 4 |
| Community_20 | 5.458 | 2 | 4 |
| Community_150 | 1.234 | 3 | 15 |
| Community_213 | 1.177 | 4 | 16 |
| Community_71 | 0.936 | 5 | 23 |
| Community_354 | 0.429 | 6 | 6 |
| Community_245 | 0.193 | 7 | 12 |
| Unrelated community | – | – | 7013 |
Unlike most multi-omics tools that select hub molecules based on statistical significance, MODA identified hub molecules by integrating both change trends and biological functions. MODA simultaneously considered both the node’s importance scores and whether it belongs to disease-related modules. Finally, we identified 58 hub nodes with higher scores within disease-related communities (Supplementary Table S7), including the Hidden_node of C00318 (carnitine), which had the highest score in community_20 (score: 4.172).
Multi-omics data integration analysis mined hidden information to enhance classification performance
To assess the classification performance of MODA, two additional batches of clinical samples from PRAD patients were analyzed in metabolomics, mRNA, and miRNA expression profiles. The baseline information of the two batch sets revealed no significant differences in Gleason grade and Prostate-Specific Antigen (PSA) levels between Batches 1 and 2 (Table 2, P > .05). Among the 58 disease-related molecules identified by MODA, 52 were detected across both datasets, including one Hidden_nodes (C00318, carnitine) (Supplementary Table S9). Differential analysis for different stages of PRAD was summarized in Supplementary Table S9. Among them, 20 molecules showed significant differences between one comparison, which suggested the feasibility and interpretability of MODA in selecting key molecules.
Table 2.
Baseline information on the two batch sets
| Batch | Group | Stage | Number | Gleason | PSA |
|---|---|---|---|---|---|
| Batch 1 | ANT | – | 21 | 7.65 ± 1.182 | 16.660 ± 14.804 |
| Cancerous | – | 19 | 7.53 ± 1.073 | 17.080 ± 15.091 | |
| Localized | 13 | 7.23 ± 0.832 | 13.436 ± 7.915 | ||
| Metastatic | 6 | 8.17 ± 1.329 | 24.967 ± 23.670 | ||
| Localized–metastatic | Z value | – | 0.14 | 0.335 | |
| P value | – | .179 | .368 | ||
| Batch 2 | ANT | – | 50 | 7.38 ± 1.008 | 23.450 ± 18.815 |
| Cancerous | – | 50 | 7.40 ± 1.050 | 23.360 ± 18.870 | |
| Localized | 31 | 7.35 ± 1.082 | 20.565 ± 16.186 | ||
| Metastatic | 19 | 7.47 ± 1.020 | 27.915 ± 22.301 | ||
| Localized–metastatic | Z value | – | 0.529 | 1.369 | |
| P value | – | .597 | .171 |
To comprehensively evaluate classification performance, we benchmarked the 52 detectable key molecules identified by MODA against seven established methods in a three-class task distinguishing ANT, localized, and metastatic prostate cancer. The comparison included a traditional approach (RF), two R-based multi-omics integration tools (MOFA2 and mixOmics), and three DL-based models (MOMA, MoGCN, and OmiEmbed). Notably, OmiEmbed [29] operated as a black-box model incapable of feature attribution and therefore could not identify key disease-associated molecules. Therefore, it was excluded from comparisons. PSA levels were assessed as a clinical comparator. However, PSA only allowed binary classification of localized versus metastatic cases, since it is measured in blood and cannot differentiate cancerous tissue from ANT.
The MODA-derived panel showed superior performance compared with other methods (Batch 1 AUC = 0.923; Batch 2 AUC = 0.806, P < .05) (Fig. 4A), except for Batch 2 where the performance of MoGCN was not significantly different (P = .745). Additionally, MODA’s diagnostic model significantly outperformed traditional clinical indicators (PSA) in distinguishing patients at different stages (Batch 1: AUC, 0.641; Batch 2: AUC, 0.616, P < .05). As shown in Supplementary Table S10, in the ANT class, MODA consistently achieved perfect results across both batches (Batch 1: recall = 1.000, specificity = 1.000, F1-score = 1.000, PR-AUC = 1.000; Batch 2: recall = 1.000, F1-score = 0.952, PR-AUC = 0.983), demonstrating superior accuracy in distinguishing nontumor samples. For the localized class, which represents a more biologically heterogeneous group, MODA maintained balanced performance across metrics (Batch 1: F1_score = 0.818, specificity = 1.000, PR-AUC = 0.888; Batch 2: F1_score = 0.793, specificity = 0.942, PR-AUC = 0.778), outperforming or matching competing methods on key metrics such as specificity. In the metastatic class, MODA demonstrated strong sensitivity and precision (Batch 1: recall = 1.000, F1_score = 0.750, PR-AUC = 0.955; Batch 2: recall = 0.789, F1_score = 0.811, PR-AUC = 0.876), indicating robust capability for detecting advanced disease. Notably, MODA consistently outperformed other baseline methods in PR-AUC across batches, especially in the ANT and metastatic classes. Calibration curves also indicated that the MODA model demonstrated more stable performance compared with the other methods (Fig. 4B).
Figure 4.
Classification performance of the hub molecules validated in the independent dataset. (A) Comparisons of classification performance between MODA and other methods in Batch 1 (left) and Batch 2 (right). (B) Calibration curves for the MODA and other models in Batch 1 (left) and Batch 2 (right). (C) Comparisons on the classification performance among Feat + Hidden, Feat, and Hidden hub molecule panels in Batch 1 (left) and Batch 2 (right). (D) Calibration curves of the Feat + Hidden, Feat, and Hidden panels in Batch 1 (left) and Batch 2 (right). (E) Bee swarm plots visualizing the predicted results from Batch 1 (left) and Batch 2 (right). Feat + Hidden panels include 52 hub molecules detectable in the independent dataset among the 58 hub molecules identified by MODA. The Hidden hub molecule refers to C00318, which was undetected in the discovery set (hidden_nodes) but is detectable in the independent validation set, enabling experimental confirmation of MODA prediction.
Moreover, we further validated the role of Hidden_nodes in classification tasks. The 52 detectable hub molecules identified by MODA were divided into three panels: Feat + Hidden (N = 52), Feat (N = 51), and Hidden (C00318) hub molecules. The Hidden panel alone demonstrated excellent classification performance in both Batch 1 (AUC, 0.926) and Batch 2 (AUC, 0.952) (Fig. 4C). The inclusion of Hidden_nodes in the diagnostic panel significantly improved the overall performance and stability (Fig. 4D), particularly in Batch 2 (Hidden versus Feat + Hidden, P = .008; Hidden versus Feat, P < .001). Bee plot visualization (Fig. 4E) also displayed improved separation between PRAD stages when Hidden_nodes are included. It highlighted the contribution of Hidden_nodes to enhancing the model’s diagnostic capability.
Multi-omics data integration analysis uncovered novel mechanisms linked to prostate cancer
We performed differential analysis of all molecules in the validation sets to assess the pathway disorders. DEMe, DEMi (P < .05, |log2FC| > 0.26), and DEGs (P < .05, |log2FC| > 1) were selected between cancerous and ANT, as well as between localized and metastatic stages. The detailed differential information was summarized in Supplementary Tables S11–S13 (Batch 1, Supplementary data), as well as in Supplementary Tables S14–S16 (Batch 2, Supplementary data). PCA score plot showed a clear separation of different PRAD stages (Supplementary Fig. S5A–S5F). The significant pathways were identified across both batches (Supplementary Fig. S6A). Notably, pathways related to cell growth and death, lipid metabolism, carbohydrate metabolism, and amino acid metabolism were confirmed, with “cellular processes” achieving the highest score (|log10p|). Lipid Metabolism and downstream correlation pathways still exhibited important roles in PRAD progression, consistent with MODA’s training predictions (Supplementary Fig. S6B). Further lipidomics analysis revealed statistically significant changes in four lipid classes-free fatty acids (FFAs), phosphatidylcholine (PC), phosphatidyl ethanolamine (PE), and triacylglyceride (TAG), throughout PRAD progression (Supplementary Fig. S7). We observed elevated FFA synthesis, particularly of long-chain fatty acids, while PC and PE levels increased in localized PRAD but decreased during the progression of PRAD. In contrast, TAGs showed the opposite trend.
To further investigate the mechanisms underlying key disease processes identified by MODA, in silico gene knockdown and experimental validation were employed on core disease-related functional modules. The shared significant molecules both in MODA and traditional methods were defined as Seed_nodes, including C00882 (dephospho-CoA), C02990 (palmitoylcarnitine), TROAP, C11orf87, and HOXB5 (Table 3 and Supplementary Fig. S8). Combined with Hidden_nodes (C00318, carnitine), six molecules were selected for further biological validation. After validation using an external dataset, TROAP, palmitoylcarnitine, and carnitine still exhibited significant trends in PRAD progression (Supplementary Table S9). Given their biological relevance, palmitoylcarnitine and carnitine were locked for further analysis.
Table 3.
The information of the seed_nodes in disease-related communities
Leveraging a GSMN, 42 and 19 genes were identified as significantly influencing the metabolic flux of carnitine and palmitoylcarnitine, respectively (Fig. 5A). Nine genes (SETDB2, BBOX1, DOT1L, KMT5C, SETD1A, KMT2B, SETDB1, ALDH9A1, and SETD1B) were shared (Fig. 5B). Notably, DOT1L [30, 31], KMT5C [32], SETD1A [33], KMT2B [34], SETDB1 [30], ALDH9A1 [35], and SETD1B [36] had been previously reported to regulate PRAD progression. However, the role of BBOX1 in PRAD had not previously been explored. Using the COBRA Toolbox, metabolic dysfunction caused by silencing BBOX was predicted in the TCGA dataset and validated in DU145 cells (Fig. 5C). Results revealed that BBOX1 was involved in steroid hormone biosynthesis and fatty acid elongation.
Figure 5.
MODA identifies and confirms novel molecular mechanism inducing prostate cancer progression. (A, B) The Venn diagram and bar plot of the key genes both regulated C00318 (left circle and below bar) and C02990 (right circle and upper bar). (C) Impact of BBOX1 knockout on biological pathways (left, TCGA data; right, our data). (D) 2D colony formation assay. DMEM, Dulbecco’s modified eagle medium. (E) CCK8 cell proliferation in transfected cells cultured in DMEM alone or supplemented with carnitine/palmitoylcarnitine. D1C, DU145_sh1 cultured in DMEM+carnitine; D1W, DU145_sh1 cultured in DMEM; D1P, DU145_sh1 cultured in DMEM + palmitoylcarnitine; DCC, DU145_shNC cultured in DMEM + carnitine; DCW, DU145_shNC cultured in DMEM; DCP, the DU145_shNC cultured in DMEM + palmitoylcarnitine. (F) PCA score plots (left, positive ion mode; right, negative ion mode). (G) The proportion of significantly differential lipids. (H) The enrichment analysis of differential molecules between DU145_sh1 and DU145_shNC.
In vitro experiments were conducted using BBOX1 knockdown DU145 cells (DU145_sh1) with control cell lines (DU145_shNC). Western blotting revealed a significant decrease in BBOX1 expression in knockdown cells compared with DU145_shNC (Supplementary Fig. S9A). Further assays (CCK-8 and colony formation) demonstrated that BBOX1 knockout inhibited DU145 cell proliferation and viability, with reduced growth rates observed at multiple time points (Fig. 5D and E, Supplementary Table S17). To investigate the functional role of BBOX1 in prostate cancer, we performed lipidomics analysis and found significant alterations in lipid profiles following BBOX1 knockdown. Notably, we found most lipids exhibited a significant difference (P < .05, |log2FC| > 0.263) (Fig. 5F). Ceramides (20.25%), PC (8.4%), and sphingomyelin (6.17%) were among the most affected lipid classes. Enrichment analysis revealed that BBOX1 knockout disrupted pathways consistent with findings from the validation set (Fig. 5G). Unsaturated fatty acid biosynthesis emerged as a key metabolic pathway in PRAD progression (Fig. 5H).
To determine whether the observed inhibition of cell proliferation was due to decreased levels of carnitine and palmitoylcarnitine, we conducted supplementation experiments. As illustrated in Fig. 5D and E, carnitine supplementation significantly improved the cell growth in both DU145_shNC and DU145_sh1, with the knockdown cells exhibiting a stronger response to supplementation (Supplementary Table S17). The plate cloning assay also supported the conclusion that carnitine supplementation increased cell growth rate. In contrast, palmitoylcarnitine supplementation did not affect cell growth. It suggested that carnitine metabolism was regulated by BBOX1 and played a pivotal role in PRAD progression. These findings validated MODA’s capability to uncover hidden disease mechanisms, identifying key molecular pathways involved in PRAD. BBOX1 was shown to regulate carnitine metabolism, affecting lipid biosynthesis and cell proliferation, providing new insights into PRAD biology and potential therapeutic targets.
Multi-omics data integration analysis demonstrated good generalization across diverse cancer datasets
We further demonstrated the generalizability of MODA on 21 cancer types in the TCGA transcriptomics datasets with varying sample sizes (ranging from 44 to 1231). The results revealed that disease-specific networks constructed for different cancers contain varying proportions of Hidden_Nodes, except for SKCM (skin cutaneous melanoma) and SARC (sarcoma) (Fig. 6A). The network information indicated that these two disease-specific networks had a smaller number of nodes, and concurrently, a lower average path length (Fig. 6B). The extraction of the specific networks within the MODA process relied on initially important molecules identified by RFs during the inter-group analysis. However, SKCM and SARC included an extremely low number of adjacent samples (Fig. 6A). This severe class imbalance might compromise ML training and consequently affect MODA’s ability to capture hidden information. By calculating the JI of key analyses identified by MODA across different cancers, it was evident that key molecules between different cancers exhibited high specificity (JI < 0.05) (Fig. 6C), and similar findings were also observed in the identification of crucial pathways (JI < 0.3) (Fig. 6D and E). Integrating the aforementioned results obtained from MODA, the cancer-pathways knowledge graph was constructed based on the 21 multiple cancer types (Supplementary Fig. S10).
Figure 6.
The assessment of the generalization performance of MODA across multiple cancers. (A) Overview of the disease-specific networks (left) and the sample labels (right). Network visualization: left, Hidden_nodes; right, Feat_nodes; left, tumor tissue; right, ANT. (B) The number of nodes and the average path length within each disease-specific network. (C) The JI of hub nodes obtained by MODA for each type of cancer. (D) The JI of the crucial pathways identified by MODA for each cancer. (E) The number of crucial pathways extracted from MODA across various cancer types. Red indicates pathways annotated by the KEGG (Kyoto encyclopedia of genes and genomes) database, and green signifies pathways annotated by the Reactome database.
Discussion
In this study, we introduce MODA, a DL-based framework designed to integrate multi-omics data for comprehensive analysis across various molecular levels. Our results demonstrate that MODA outperforms existing methods in biomarker discovery and elucidation of crucial disease mechanisms. Its accuracy and biological interpretability have been validated through population data and in vitro experiments.
Distinguished from other leading-edge methods that primarily concentrate on individual samples [37, 38], MODA employs a molecular network framework that integrates a variety of omics layers. By leveraging pre-existing experimental data with various ML methods, MODA enhances data reliability and effectively addresses challenges such as limited sample sizes and imbalanced labeling. MODA incorporates a GCN with a multihead attention to improve the robustness of molecular importance scoring, including for nodes without prior experimental measurements. Lastly, the CPM algorithm identifies central functional modules that govern disease progression by eliminating unrelated molecules. By integrating alteration trends with biological functions, MODA effectively identifies pivotal molecules, further substantiated by gene knockout simulations.
The application of MODA to prostate cancer (PRAD) highlights its capacity to identify hub molecules and disrupted pathways relevant to cancer progression. MODA effectively removes a large amount of irrelevant information and extracts key molecules with superior classification performance (Fig. 4A–C). Using RF, we extracted four Feat_nodes (TROAP, C11orf87, HOXB5, and palmitoylcarnitine) and one Hidden_node (carnitine) for further biological validation. Previous studies have associated palmitoylcarnitine [39], carnitine [40], and HOXB5 [41] with prostate cancer, supporting the biological relevance of our findings. Moreover, independent validation datasets further demonstrate that carnitine, palmitoylcarnitine, and TROPA could serve as potential biomarkers for PRAD screening (Fig. 4; Supplementary Table S9). Gene knockout experiment explores BBOX1 as a regulator of both carnitine and palmitoylcarnitine metabolism, affecting PRAD development and proliferation (Fig. 5). A key strength of our study is the complementary use of TCGA–PRAD for performance benchmarking and in-house metabolomics-inclusive cohorts for clinical and experimental validation, which is not feasible with TCGA datasets alone.
Compared with traditional pathway enrichment analysis, MODA identifies crucial functional modules from overlapping communities that encompass multiple omic layers. These modules are involved in various pathways and provide insight into mechanisms driving disease progression. For instance, apart from glycolysis, which serves as the primary energy source for cancer cells [42], MODA identified pathways associated with fatty acid oxidation (FAO), such as fatty acid degradation and lysine degradation, as critical metabolic dependencies in PRAD (Fig. 3D). This is mechanistically supported by the overexpression of CPT1A, the rate-limiting enzyme that conjugates carnitine to long-chain fatty acids (e.g. forming palmitoylcarnitine for mitochondrial import in aggressive tumors [43], where it correlates with poor prognosis [44]. Supplementary Table S8 further reveals multiple PRAD-related pathways linked to FAO dysregulation. The Wnt signaling pathway influences the cellular energy state and indirectly regulates FAO activity. Sphingolipid metabolism and lysine degradation also contribute to fatty acid metabolism and feed intermediates into the TCA cycle. Moreover, pathways such as inositol phosphate metabolism also modulate cellular energy balance.
Carnitine serves as a key molecule in lipid metabolism, mediating the β-oxidation of long-chain FFAs by transferring them into the mitochondria through carnitine translocase [45]. Lysine degradation provides the carbon skeletons for carnitine synthesis [45]. In vitro experiments also suggest that carnitine supplementation accelerates cell growth, implying that the carnitine system can serve as a key regulator of metabolic flexibility in cancer cells by regulating lipid metabolism [46]. Lipidomics analysis of tissue samples (Supplementary Fig. S7) shows a significant increase in long-chain FFAs. In contrast, short- and medium-chain FFAs levels remain unchanged, as they can enter mitochondria without the carnitine/acylcarnitine transport system [47]. The MODA framework identifies BBOX1 as a regulator of carnitine and palmitoylcarnitine metabolism through gene knockdown simulations. BBOX1 knockdown significantly decreases cell proliferation, highlighting its potential as a therapeutic target for modulating FAO [48].
Conclusion
MODA provides a powerful framework for advancing multi-omics research by constructing disease-specific knowledge graphs and applying GCN-based models for effective data integration, even with small sample sizes. By inferring the importance scores of undetectable molecules and delineating core functional modules associated with disease progression, MODA enhances the identification of critical disease mechanisms. This approach maximizes data utilization while reducing experimental costs, offering a valuable tool to support mechanistic insights and inform precision medicine strategies. As more metabolomics-inclusive cohorts become available, MODA can be further extended across diverse diseases to uncover novel metabolic dependencies and therapeutic targets.
Key Points
Multi-omics data integration analysis (MODA) integrates multi-omics data using a graph convolutional network framework with attention mechanisms to identify hub molecules and elucidate disease pathways, outperforming existing methods.
MODA constructs comprehensive knowledge-driven graphs and disease-specific networks, enhancing training efficiency and biological specificity.
MODA predicts metabolic flux from transcriptome data, expanding omics dimensions for a more comprehensive understanding of disease mechanisms.
MODA utilizes various machine learning algorithms to transform raw omics data into feature importance matrices, facilitating the integration of heterogeneous omics data and enabling effective training with limited sample sizes.
MODA employs a GCN-based overlapping community detection algorithm to identify biological functional modules, surpassing conventional pathway annotation constraints and capturing core molecular mechanisms with high precision.
Supplementary Material
Contributor Information
Jinhui Zhao, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.
Yanyan Zhou, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; Hepatobiliary Surgery Department, Dalian Medical University, No. 9, West Section of Lvshun South Road, Dalian 116044, P.R. China.
Han Bao, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.
Xinjie Zhao, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.
Xinxin Wang, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.
Chunxia Zhao, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.
Wangshu Qin, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.
Xin Lu, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.
Guowang Xu, State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China; University of Chinese Academy of Sciences, No. 1 Yanqihu East Road, Huairou District, Beijing 100049, P.R. China; Liaoning Province Key Laboratory of Metabolomics, No. 457 Zhongshan Road, Shahekou District, Dalian, Liaoning 116023, P.R. China.
Author contributions
G.X. and X.L. supervised the project. J.Z. developed the DL framework and led benchmarks and case studies. Y.Z. and W.Q. conducted the biological validations and functional analyses. H.B. and X.W. contributed to the optimization of the model framework. X.Z. and C.Z. provided essential feedback and improvements for the methodological framework. J.Z. drafted the manuscript, which was revised by X.L. and G.X. All authors reviewed and approved the final manuscript.
Conflict of interest: None declared.
Funding
This research was supported by the National Natural Science Foundation of China (grant nos 22434006, 22274153, and 22274151), and the Innovation Program (grant no. DICP I202334) of Science and Research from the DICP, CAS, and the AI S&T Program (grant no. DNL-YL A202202) from Yulin Branch, Dalian National Laboratory for Clean Energy, CAS, China, and Liaoning Province International Science and Technology Cooperation Program Project (grant no. 2023JH2/10700023).
Data availability
Omics profilings from multiple cancer datasets were downloaded from the TCGA database (https://portal.gdc.cancer.gov/) by “TCGAbiolinks” R package. Clinical data, demographics, and other clinical features of these patients were extracted from the TCGA database. The multi-omics datasets of extra-prostate cancer were requested by the corresponding authors. The data are not publicly available due to privacy or ethical restrictions. All related scripts and code supporting this study are publicly available at https://github.com/zhaoxiaoqi0714/MODA.
Ethics approval and consent to participate
This study was conducted according to the ethical standards of the local institutional review board, and all participants provided written informed consent before their involvement.
References
- 1. Nitsch L, Lareau CA, Ludwig LS. Mitochondrial genetics through the lens of single-cell multi-omics. Nat Genet 2024;56:1355–65. 10.1038/s41588-024-01794-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Kernohan KD, Boycott KM. The expanding diagnostic toolbox for rare genetic diseases. Nat Rev Genet 2024;25:401–15. 10.1038/s41576-023-00683-w [DOI] [PubMed] [Google Scholar]
- 3. Abedini A, Levinsohn J, Klötzer KA. et al. Single-cell multi-omic and spatial profiling of human kidneys implicates the fibrotic microenvironment in kidney disease progression. Nat Genet 2024;56:1712–24. 10.1038/s41588-024-01802-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. He X, Liu X, Zuo F. et al. Artificial intelligence-based multi-omics analysis fuels cancer precision medicine. Semin Cancer Biol 2023;88:187–200. 10.1016/j.semcancer.2022.12.009 [DOI] [PubMed] [Google Scholar]
- 5. Huttlin EL, Bruckner RJ, Navarrete-Perea J. et al. Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell 2021;184:3022–3040.e28. 10.1016/j.cell.2021.04.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Karayel O, Virreira Winter S, Padmanabhan S. et al. Proteome profiling of cerebrospinal fluid reveals biomarker candidates for Parkinson’s disease. Cell Rep Med 2022;3:100661. 10.1016/j.xcrm.2022.100661 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Wishart DS. Metabolomics for investigating physiological and pathophysiological processes. Physiol Rev 2019;99:1819–75. 10.1152/physrev.00035.2018 [DOI] [PubMed] [Google Scholar]
- 8. Zhou G, Li S, Xia J. Network-based approaches for multi-omics integration. Methods Mol Biol 2020;2104:469–87. 10.1007/978-1-0716-0239-3_23 [DOI] [PubMed] [Google Scholar]
- 9. Chen W, Zhang P, Zhang X. et al. Machine learning-causal inference based on multi-omics data reveals the association of altered gut bacteria and bile acid metabolism with neonatal jaundice. Gut Microbes 2024;16:2388805. 10.1080/19490976.2024.2388805 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Wang K, Abid MA, Rasheed A. et al. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. Mol Plant 2023;16:279–93. 10.1016/j.molp.2022.11.004 [DOI] [PubMed] [Google Scholar]
- 11. Forrest IS, Petrazzini BO, Duffy Á. et al. Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. Lancet 2023;401:215–25. 10.1016/S0140-6736(22)02079-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Chi C, Ye Y, Chen B. et al. Bipartite graph-based approach for clustering of cell lines by gene expression–drug response associations. Bioinformatics 2021;37:2617–26. 10.1093/bioinformatics/btab143 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Theodoris CV, Xiao L, Chopra A. et al. Transfer learning enables predictions in network biology. Nature 2023;618:616–24. 10.1038/s41586-023-06139-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Xuan P, Gong Z, Cui H. et al. Fully connected autoencoder and convolutional neural network with attention-based method for inferring disease-related lncRNAs. Brief Bioinform 2022;23:bbac089. 10.1093/bib/bbac089 [DOI] [PubMed] [Google Scholar]
- 15. Lin L, Xiong M, Zhang G. et al. A convolutional neural network and graph convolutional network based framework for AD classification. Sensors (Basel) 2023;23:1914. 10.3390/s23041914 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Zhang Q, Wei Y, Han Z., et al. Multimodal fusion on low-quality data: a comprehensive survey. arXiv, 10.48550/arXiv.2404.18947, 1 November 2024, preprint: not peer reviewed. [DOI] [Google Scholar]
- 17. Novakovsky G, Dexter N, Libbrecht MW. et al. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet 2023;24:125–37. 10.1038/s41576-022-00532-2 [DOI] [PubMed] [Google Scholar]
- 18. Lewis JE, Forshaw TE, Boothman DA. et al. Personalized genome-scale metabolic models identify targets of redox metabolism in radiation-resistant tumors. Cell Syst 2021;12:68–81.e11. 10.1016/j.cels.2020.12.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Heirendt L, Arreckx S, Pfau T. et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat Protoc 2019;14:639–702. 10.1038/s41596-018-0098-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Argelaguet R, Arnol D, Bredikhin D. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol 2020;21:111. 10.1186/s13059-020-02015-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Rohart F, Gautier B, Singh A. et al. mixOmics: an R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol 2017;13:e1005752. 10.1371/journal.pcbi.1005752 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Moon S, Lee H. MOMA: a multi-task attention learning algorithm for multi-omics data interpretation and classification. Bioinformatics 2022;38:2287–96. 10.1093/bioinformatics/btac080 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Li X, Ma J, Leng L. et al. MoGCN: a multi-omics integration method based on graph convolutional network for cancer subtype analysis. Front Genet 2022;13:806842. 10.3389/fgene.2022.806842 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. OmiEmbed : A Unified Multi-Task Deep Learning Framework for Multi-Omics Data - PubMed. https://pubmed.ncbi.nlm.nih.gov/34207255/. (17 December 2024, date last accessed). [DOI] [PMC free article] [PubMed]
- 25. Ren S, Shao Y, Zhao X. et al. Integration of metabolomics and transcriptomics reveals major metabolic pathways and potential biomarker involved in prostate cancer. Mol Cell Proteomics 2016;15:154–63. 10.1074/mcp.M115.052381 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Chen S, Hoene M, Li J. et al. Simultaneous extraction of metabolome and lipidome with methyl tert-butyl ether from a single small tissue sample for ultra-high performance liquid chromatography/mass spectrometry. J Chromatogr A 2013;1298:9–16. 10.1016/j.chroma.2013.05.019 [DOI] [PubMed] [Google Scholar]
- 27. Li J, Ren S, Piao HL. et al. Integration of lipidomics and transcriptomics unravels aberrant lipid metabolism and defines cholesteryl oleate as potential biomarker of prostate cancer. Sci Rep 2016;6:20984. 10.1038/srep20984 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Ren S, Peng Z, Mao JH. et al. RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant alternative splicings. Cell Res 2012;22:806–21. 10.1038/cr.2012.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Zhang X, Xing Y, Sun K. et al. OmiEmbed: a unified multi-task deep learning framework for multi-omics data. Cancers (Basel) 2021;13:3047. 10.3390/cancers13123047 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Vatapalli R, Sagar V, Rodriguez Y. et al. Histone methyltransferase DOT1L coordinates AR and MYC stability in prostate cancer. Nat Commun 2020;11:4153. 10.1038/s41467-020-18013-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Thomas T. DOT1L in prostate cancer. Nat Rev Urol 2020;17:544. 10.1038/s41585-020-0374-0 [DOI] [PubMed] [Google Scholar]
- 32. Quan Y, Zhang X, Wang M. et al. Histone lysine methylation patterns in prostate cancer microenvironment infiltration: integrated bioinformatic analysis and histological validation. Front Oncol 2022;12:981226. 10.3389/fonc.2022.981226 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Yang L, Jin M, Park SJ. et al. SETD1A promotes proliferation of castration-resistant prostate cancer cells via FOXM1 transcription. Cancers (Basel) 2020;12:1736. 10.3390/cancers12071736 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Zhang J, Ding X, Peng K. et al. Identification of biomarkers for immunotherapy response in prostate cancer and potential drugs to alleviate immunosuppression. Aging (Albany NY) 2022;14:4839–57. 10.18632/aging.204115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Bova GS, Kallio HML, Annala M. et al. Integrated clinical, whole-genome, and transcriptome analysis of multisampled lethal metastatic prostate cancer. Cold Spring Harb Mol Case Stud 2016;2:a000752. 10.1101/mcs.a000752 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Craddock J, Jiang J, Patrick SM. et al. Alterations in the epigenetic machinery associated with prostate cancer health disparities. Cancers (Basel) 2023;15:3462. 10.3390/cancers15133462 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Steyaert S, Pizurica M, Nagaraj D. et al. Multimodal data fusion for cancer biomarker discovery with deep learning. Nat Mach Intell 2023;5:351–62. 10.1038/s42256-023-00633-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Ding S, Li J, Wang J. et al. Multimodal co-attention fusion network with online data augmentation for cancer subtype classification. IEEE Trans Med Imaging 2024;43. 10.1109/TMI.2024.3405535 [DOI] [PubMed] [Google Scholar]
- 39. Al-Bakheit A, Traka M, Saha S. et al. Accumulation of palmitoylcarnitine and its effect on pro-inflammatory pathways and calcium influx in prostate cancer. Prostate 2016;76:1326–37. 10.1002/pros.23222 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Triscott J, Lehner M, Benjak A. et al. Loss of PI5P4Kα slows the progression of a Pten mutant basal cell model of prostate cancer. Mol Cancer Res 2025;23:33–45. 10.1158/1541-7786.MCR-24-0290 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Sekino Y, Pham QT, Kobatake K. et al. HOXB5 overexpression is associated with neuroendocrine differentiation and poor prognosis in prostate cancer. Biomedicines 2021;9:893. 10.3390/biomedicines9080893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Tan Y, Li J, Zhao G. et al. Metabolic reprogramming from glycolysis to fatty acid uptake and beta-oxidation in platinum-resistant cancer cells. Nat Commun 2022;13:4554. 10.1038/s41467-022-32101-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Ma L, Chen C, Zhao C. et al. Targeting carnitine palmitoyl transferase 1A (CPT1A) induces ferroptosis and synergizes with immunotherapy in lung cancer. Signal Transduct Target Ther 2024;9:64. 10.1038/s41392-024-01772-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Li R, Li X, Zhao J. et al. Mitochondrial STAT3 exacerbates LPS-induced sepsis by driving CPT1a-mediated fatty acid oxidation. Theranostics 2022;12:976–98. 10.7150/thno.63751 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Meng Y, Guo D, Lin L. et al. Glycolytic enzyme PFKL governs lipolysis by promoting lipid droplet-mitochondria tethering to enhance β-oxidation and tumor cell proliferation. Nat Metab 2024;6:1092–107. 10.1038/s42255-024-01047-2 [DOI] [PubMed] [Google Scholar]
- 46. Melone MAB, Valentino A, Margarucci S. et al. The carnitine system and cancer metabolic plasticity. Cell Death Dis 2018;9:228. 10.1038/s41419-018-0313-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Nicholas DA, Proctor EA, Agrawal M. et al. Fatty acid metabolites combine with reduced β oxidation to activate Th17 inflammation in human type 2 diabetes. Cell Metab 2019;30:447–461.e5. 10.1016/j.cmet.2019.07.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Wang J, Zhou Y, Zhang D. et al. CRIP1 suppresses BBOX1-mediated carnitine metabolism to promote stemness in hepatocellular carcinoma. EMBO J 2022;41:e110218. 10.15252/embj.2021110218 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Omics profilings from multiple cancer datasets were downloaded from the TCGA database (https://portal.gdc.cancer.gov/) by “TCGAbiolinks” R package. Clinical data, demographics, and other clinical features of these patients were extracted from the TCGA database. The multi-omics datasets of extra-prostate cancer were requested by the corresponding authors. The data are not publicly available due to privacy or ethical restrictions. All related scripts and code supporting this study are publicly available at https://github.com/zhaoxiaoqi0714/MODA.










