Highly accurate diagnosis of pancreatic cancer by integrative modeling using gut microbiome and exposome data

Yuli Zhang; Haohong Zhang; Bingqiang Liu; Kang Ning

doi:10.1016/j.isci.2024.109294

. 2024 Feb 21;27(3):109294. doi: 10.1016/j.isci.2024.109294

Highly accurate diagnosis of pancreatic cancer by integrative modeling using gut microbiome and exposome data

Yuli Zhang ^1,³, Haohong Zhang ^2,³, Bingqiang Liu ^1,^4,^∗, Kang Ning ^2,^∗∗

PMCID: PMC10915599 PMID: 38450156

Summary

The noninvasive detection of pancreatic ductal adenocarcinoma (PDAC) remains an immense challenge. In this study, we proposed a robust, accurate, and noninvasive classifier, namely Multi-Omics Co-training Graph Convolutional Networks (MOCO-GCN). It achieved high accuracy (0.9 $\pm$ 0.06), F1 score (0.9 $\pm$ 0.07), and AUROC (0.89 $\pm$ 0.08), surpassing contemporary approaches. The performance of model was validated on an external cohort of German PDAC patients. Additionally, we discovered that the exposome may impact PDAC development through its complex interplay with gut microbiome by mediation analysis. For example, Fusobacterium hwasookii nucleatum, known for its ability to induce inflammatory responses, may serve as a mediator for the impact of rheumatoid arthritis on PDAC. Overall, our study sheds light on how exposome and microbiome in concert could contribute to PDAC development, and enable PDAC diagnosis with high fidelity and interpretability.

Subject areas: Environment, Microbiome, Cancer, Machine learning

Graphical abstract

Highlights

•
MOCO-GCN achieved high-precision and noninvasive diagnosis of PDAC
•
Refining the intricate interplay between the exposome and microbiome in PDAC insights
•
MOCO-GCN identified potential microbiome and exposome biomarkers for PDAC

Environment; Microbiome; Cancer; Machine learning

Introduction

Pancreatic ductal adenocarcinoma (PDAC) is the fourth leading cause of death.¹^,² It ranks firmly last among all cancer in terms of prognosis and only about 4% of patients would live five years after diagnosis as it often presents at an advanced stage.³^,⁴ Recent studies have explored the PDAC biomarkers in tumor,⁵^,⁶ blood,⁷ pancreatic tissue,⁸ urine,⁹ and serum.¹⁰ Currently, the only FDA-approved biomarker for pancreatic cancer is carbohydrate antigen (CA) 19-9; however, its specificity is limited by a high false positive rate, as its concentration may increase in benign diseases like gallstone and bile duct obstruction.¹¹^,¹² Consequently, a noninvasive, robust, and accurate screening and diagnostic tool for PDAC is still urgently needed.

Numerous studies have explored links between PDAC and the oral¹³^,¹⁴^,¹⁵ or fecal microbiome.¹⁶^,¹⁷ Nagata et al. conducted a multinational study and accurately predicted PDAC using 30 gut and 18 oral microbial species, achieving high area under the receiver operating characteristic (AUROCs) of 0.78–0.82.¹⁶ Kartal et al. proposed a fecal metagenomic classifiers based on 27 gut microbial species that could identify PDAC with high accuracy (0.84 AUROC) and validated the classifier in an independent German cohort (0.83 AUCROC) and confirmed the specificity in 25 publicly available studies.¹⁷ Additionally, Half et al.¹⁸ and Ren et al.¹⁹ employed the random forest to predict PDAC with high accuracy of 0.825 and 0.842 AUROC. These studies have shown that microbiota-based screening for the detection of PDAC is feasible.

The exposome is the comprehensive collection of all exposures, including smoking, alcohol, diet, exercise, other lifestyle factors, medication, host diseases, and more.²⁰ Risk factors associated with the development of PDAC include alcohol,²¹ advancing age,²²^,²³ smoking,²⁴ family history,²⁵^,²⁶ diabetes,²⁷^,²⁸ obesity,²⁹ etc. Changes in the gut microbiome can both affect and mediate the effects of exposome on the risk of PDAC. For example, dysbiosis of the microbiota has been linked to an increased incidence of obesity,³⁰ with signaling pathways leading to NF-kB activation, contributing to inflammatory agents.³¹ Conversely, exposures that affect pancreatic tumor evolution could also affect the gut microbiome. Phillip et al. suggested that long-term alcohol consumption could induce the dysbiosis of Firmicutes and Bacteroidetes, which are enriched or depleted in PDAC.³² Meanwhile, physical activity may protect against PDAC by increasing the abundance of SCFA-producing bacteria.³³ Therefore, the microbiome and exposome could in concert influence the metabolic and immune pathways of PDAC.

Pancreatic cancer used to be considered a localized disease because of its occurrence in pancreas tissue, an organ in the abdomen that lies behind the lower part of stomach.³⁴ However, the current understanding of PDAC is that it is not solely a tumor microenvironment issue but also a systemic and environmental disease that involves both the microbiome and exposome. The possible pathways of microbiome and exposome interactions that influence pancreatic carcinogenesis are shown in Figure S1A. Treatment efficiency and adverse effects can differ vastly between individuals due to differences in age, sex, and environmental factors. The aim of precision medicine is thus to design the most appropriate intervention based on the biological information of each individual. Most existing efforts focus on exploring the role of microbiome in PDAC, but this approach oversimplifies the complexity of biological systems. Exposome also affects complex molecular pathways where different biological layers interact with each other. Consequently, although the development of PDAC should be considered as a systemic and environmental disease that includes factors from both exposome and microbiome, studies of pancreatic cancer have rarely been conducted to consider the combination of exposome and microbiome for its diagnosis.

In this study, we presented a robust, accurate, and noninvasive classifier based on the combination of exposome and microbiome data for PDAC diagnosis. Our cohort consisted of patients with detailed host variables, including subject characteristics, lifestyle factors, oral health status, medication use, and other host disease status. We found significant differences in gut microbiome profiles between PDAC and controls, as determined by microbiome-associated statistical and machine-learning analyses. Through mediation analyses, we revealed putative causal relationships between PDAC, the gut microbiome and exposome. For example, Fusobacterium hwasookii nucleatum can mediate the impact of rheumatoid arthritis on PDAC. Next, we applied the MOCO-GCN, which enables omics-specific and cross-omics association learning, for effective PDAC classification. Our classifier achieved excellent discrimination performance, with a combination of 125 microbial species and 23 exposures resulting in with 0.9 $\pm$ 0.06 ACC, 0.9 $\pm$ 0.07 F1, 0.89 $\pm$ 0.08 AUROC, 0.86 $\pm$ 0.13 Sn, 0.93 $\pm$ 0.10 Sp, and 0.80 $\pm$ 0.11 MCC. MOCO-GCN also achieved high prediction performance on an external cohort (AUROC = 0.81 $\pm$ 0.09). Our findings about changes in microbiome abundance (enriched or depleted) in PDAC and exposures are consistent with previous studies, including the increase of Fusobacterium hwasookii nucleatum and depletion of Faecalibacterium prausnitzii. MOCO-GCN outperformed traditional machine learning and other multi-omics classification methods, and our results were also better than those from previous studies that predicted PDAC with microbiome alone. These results highlight the importance of considering both exposome and microbiome data for PDAC diagnosis. Overall, our study provides valuable insights into the complex interplay between the exposome and microbiome and their contribution to the development of PDAC. The complete workflow of this study is illustrated in Figure S1B.

Results

Data collection and preparation

To predict PDAC by combining exposome and microbiome data, we collected 107 metagenomic samples from Spanish and 76 samples from German based on fecal microbial species.¹⁷ Missing values in the German metadata were imputed using the missForest algorithm, a random forest-based method for missing data imputation.³⁵ The imputation process involved the construction of 100 trees to accurately estimate and fill in the missing values. Of these, every metadata was able to be defined as a binary variable with a positive and negative class, for example, alcohol was considered positive for drinkers and negative for non-drinkers. Full definitions and descriptions are shown in Table S3. This study involved PDAC patients with an age distribution ranging from 38 to 93 years old. Among the 107 patients, 67 were females, and 40 were males. The smoking status included ex-smokers, smokers, and non-smokers. Alcohol consumption was categorized into drinkers and non-drinkers. The patients' other health conditions, such as high blood pressure, diabetes, gum recession, among others, were recorded, totaling 10 different health conditions. Additionally, information on medication use included the usage of seven drugs, including aspirin, antibiotics, and so on.

Impact of exposome on microbial composition in pancreatic cancer patients

To identify the impact of exposome on microbial composition in pancreatic cancer, we used gut microbiome data from Spanish,¹⁷ which included 57 PDAC patients and 50 controls, along with detailed 24 host variables (Figure S1C). Each metadata was classified as binary, with positive and negative classes. Our dataset comprised 24 host metadata variables categorized into four groups: subject characteristics, lifestyle factors, oral health, medication use, and host disease. Most of these variables are known risk factors for pancreatic cancer. Subsequently, our aim was to determine whether there were disparities in the distribution of microbial composition among the participants based on variations in host variables.

To achieve this, we created two cohorts: a confounder-unmatched cohort and a matched cohort, based on whether the confounding variables matched. We reselected PDAC patients in a pairwise manner by identifying a control participant who was matched for values of each host metadata variables with only five of the 24 variables differed at most. In the same way, for ‘unmatched’ cohorts, patients and controls were unmatched for confounding variables as much as possible. Eventually, the matched and unmatched cohorts were composed of 50 samples with 25 cases and 25 controls respectively. Specific cohorts were shown in Table S4. We then conducted a series of statistical analyses to explore whether matching cases and controls for confounding variables could reduce observed differences in the microbiota. Our analysis of beta diversity revealed a significant difference in microbiota between PDAC patients and controls in the unmatched cohort (PERMANOVA, p = 0.001), but not in the matched cohort (p = 0.249) (Figure 1A). Similarly, the alpha diversity difference in unmatched cohort (Wilcoxon test, p = 0.002) was more significant than matched cohort (p = 0.099) (Figure 1B). Besides, we used Wilcoxon to test enriched taxa between PDAC and control (Figures 1C and 1D) and there were 5 taxa with p value below 0.001, 31 taxa with p value below 0.01, 88 taxa with p value below 0.05 in unmatched cohort, while 12 taxa with p value below 0.01, 83 taxa with p value below 0.05 in matched cohort (Figure S2). Additionally, we calculated the area under the receiver operating characteristic curve (AUROC) values by a 25-repeat stratified 4-fold cross-validation random forest for PDAC and controls before and after matching variables. According to the values of AUROC, the matched and unmatched cohort differed markedly by machine learning (Figure 1E). Overall, our results indicated that exposome played a significant role in shaping the composition of the gut microbiome in PDAC patients.

Variation of the matched and unmatched cohort in microbiota due to confounding variables between PDAC and controls

(A) Principal coordinates analysis (PCoA) plot of PDAC and controls in the confounding-unmatched and matched cohort, with the PERMANOVA p value. The centroids for the PDAC and controls are depicted by an outlined circle. Colors denote groups, with blue for controls and red for PDAC patients.

(B) Alpha diversity measurements comparing PDAC and controls in the unmatched and matched cohort. It was calculated as the Shannon index. Colors denote groups, with blue for controls and red for PDAC patients. Pairwise comparisons were performed using the Wilcoxon test.

(C and D) The difference abundance analysis between PDAC and controls in the matched and unmatched cohort. It was implemented by the Wilcoxon test. Y-axis is log10 (p values), X axis is generalized fold change. Purple dots represent significantly differentially abundant in either group, while black dots show non-significant species.

(E) Random forest AUROC values for PDAC and controls before and after matching for confounding variables.

We used the aforementioned random forest framework (Figure 2A) to predict binary variables by microbiome data in turn. The resulting values of AUROC (Figure 2B) revealed significant associations between microbiota and certain host variables, including jaundice, alcohol, acid regurgitation medication use, family history of PDAC, corticosteroids medication use, diabetes, and country (mean AUROC >0.6). Besides, we observed 21 significant associations (FDR_Spearman < 0.05) between 21 species and 6 exposures (Figure 2C and Table S1). For instance, the consumption of probiotics showed a significant positive correlation with the abundance of Lactobacillus species and Clostridium species. The cellular components and metabolites of these species play a crucial role in probiotic functions, primarily by activating gut epithelial cells and improving the integrity of the intestinal barrier.³⁶

The random forest analysis framework and the significant association between species and exposome

(A) The random forest analysis framework. The 25-repeat stratified 4-fold cross-validation over 75/25 splits was used for each binary variable.

(B) The results of receiver operating characteristic (AUROC) for 23 binary lifestyle and disease variables.

(C) Correlation network diagram showing the significant association between 21 species and 6 exposures. It was calculated using the Spearman correlation coefficient. The FDR was calculated using the Benjamin-Hochberg correction. Red line denotes a positive relationship, while blue line denotes a negative relationship.

Exposome‒microbiome mediation effects in PDAC

To explore the connections between exposome, the gut microbiome, and PDAC, we conducted a bi-directional mediation analysis using 23 exposures and 9 species that exhibited significant associations with PDAC (FDR_Spearman < 0.05). We identified a total of 29 mediating linkages (FDR_mediation < 0.05 & FDR_{inverse-mediation} > 0.05), with 23 involving the exposome impacting on PDAC through the microbiome, and 6 involving the microbiome impacting on PDAC through the exposome (Figures 3A and 3B). Most of these linkages were related to the impact of lifestyle factors and host disease on PDAC through microbiome. For example, we observed that diabetes can mediate the abundance of Alloscardovia omnicolens, thereby affecting the risk of PDAC (Figure 3C).

Mediation analysis identifies linkages between the microbiome, exposome and pancreatic cancer

(A) Parallel coordinates chart showing the 23 mediation effects of exposome on PDAC through the microbiome, with significant level (FDR < 0.05). Shown are exposome (left), microbiome (right). The curved lines connecting the panels indicate the mediation effects.

(B) Parallel coordinates chart showing the 6 mediation effects of the microbiome on PDAC through the exposome, with significant level (FDR < 0.05). Shown are microbiome (left), exposome (right).

(C) Analysis of the effect of diabetes on PDAC as mediated by the abundance of Alloscardovia omnicolens.

(D) Analysis of the effect of rheumatoid arthritis on PDAC as mediated by the abundance of Fusobacterium hwasookii nucleatum.

Rheumatoid arthritis (RA) is a chronic and systemic disease primarily characterized by inflammatory synovitis, the underlying cause of which remains unknown.³⁷ Fusobacterium hwasookii nucleatum has the ability to activate the immune system and trigger inflammatory responses.³⁸ In patients with RA, immune dysregulation and the progression of joint inflammation can potentially influence the composition of the microbial community, thereby contributing to an increase in Fusobacterium hwasookii nucleatum abundance. We observed that Fusobacterium hwasookii nucleatum can mediate the impact of rheumatoid arthritis on PDAC (Figure 3D; P_mediation = 2.2 $\times$ 10⁻¹⁶). Previously, Motasem et al. conducted a comprehensive nationwide study, which proposed that RA can manifest with extra-articular involvement in multiple organs, including the pancreas. Their findings revealed an elevated risk of pancreatic cancer among patients with RA, and those with a history of RA often exhibited a poorer prognosis.³⁹

The exposome is used in concert with gut microbiome data to predict pancreatic cancer

To investigate the potential of combining exposome and gut microbiome data in predicting pancreatic cancer, we employed a framework called MOCO-GCN, which integrates predictions from both sources by leveraging their potential influence on the disease course and outcomes. MOCO-GCN is composed of a two-view co-training Graph Convolutional Networks (GCNs) and a View Correlation Discovery Network (VCDN) to classify PDAC and controls. The framework of MOCO-GCN is shown in Figure 4A. Specifically, co-training GCNs mainly predict initial labels with exposome and microbiome data by distilling knowledge from each other, while VCDN can effectively integrate initial labels by exploring the latent associations in the higher-level label space across exposome and microbiome data.⁴⁰ To evaluate the performance of MOCO-GCN, we performed a 4-fold cross-validation, and assessed the final model performance using the average accuracy (ACC), average F1-score (F1), average AUROC, average Sensitivity (Sn), average Specificity (Sp), and average Matthews Correlation Coefficient (MCC).

Illustration and performance of MOCO-GCN

(A) The framework of MOCO-GCN. It combines of a Two-view Co-training Graph Convolutional Networks (GCNs) module that learns different omics data features by distilling knowledge from each other and a View Correlation Discovery Network (VCDN) module that integrates multi-omics data. Each species-GCN and exposome-GCN are trained to perform class prediction and the corresponding sample similarity network generated from the exposome and microbiome data. The co-training allows them to distill knowledge from each other by adding their most confident unlabeled data into the training set. The cross-omics discovery tensor is calculated from the initial predictions of omics-specific GCNs and forwarded to VCDN for final prediction. MOCO-GCN is an end-to-end model and all networks are trained jointly.

(B) The performance of the MOCO-GCN on Spanish and German cohorts are shown as receiver operating characteristic (ROC) curve with 95% CI shaded in corresponding color.

(C) Performance of MOCO-GCN under different values of hyper parameter k.

(D) The comparison between MOCO-GCN and several traditional machine learning methods, including Support vector machine classifier (SVM), Linear regression trained with L2 regularization (Lasso), Random Forest classifier (RF), Gradient tree boosting-based classifier (XGBoost), and other multi-omics classification methods: MOGONET (Multi-Omics Graph Convolutional NETworks); NN_VCDN (fully connected NN with the same layers as the GCN in MOGONET). Data are represented as mean ± SEM.

(E) The comparison between this study and previous studies that predict PDAC using gut microbiome alone.

We trained our model based on 23 exposures and 125 species selected through difference abundance analyses (Wilcoxon, p < 0.05; Table S2) and achieved excellent performance with 0.9 $\pm$ 0.06 ACC, 0.9 $\pm$ 0.07 F1, 0.89 $\pm$ 0.08 AUROC (Figure 4B), 0.86 $\pm$ 0.13 Sn, 0.93 $\pm$ 0.10 Sp, and 0.80 $\pm$ 0.11 MCC. Additionally, we conducted a sensitivity analysis that focused on the parameter k, which represents the average number of edges retained per node. Figure 4C illustrates the performance of MOCO-GCN as k varies from 2 to 10, demonstrating the stability of our model. We compared the performance of our model with several traditional machine learning methods, including Support vector machine classifier (SVM), Linear regression trained with L2 regularization (Lasso), Random Forest classifier (RF), Gradient tree boosting-based classifier (XGBoost), and other multi-omics classification methods: MOGONET (Multi-Omics Graph Convolutional NETworks);⁴¹ NN_VCDN (fully connected NN with the same layers as the GCN in MOGONET). These traditional machine learning methods were trained with the direct concatenation of the 125 species and 23 exposures as input. According to the classification results (Figure 4D), our model outperformed the previous methods and was more capable to predict pancreatic cancer with the integration of exposome and microbiome data.

According to the calculation of feature importance by our model, the top 45 features consist of three exposures, and 42 species are shown as a heatmap in Figure S3. Seventeen bacterial species were increased in the PDAC patients (n = 57) in comparison to those of the controls (n = 50), whereas 25 bacterial species were decreased. Among the 42 significantly important species, 26 (61.9%) belonged to the Firmicutes phylum, 7 (16.67%) belonged to Fusobacteria phylum, 3 (7.1%) belonged to CFB group bacteria phylum, 1 (2.4%) belonged to Actinobacteria phylum, 1 (2.4%) belonged to Basidiomycete fungi phylum, 3 (7.1%) belonged to High G + C Gram-positive bacteria class, and 1(2.4%) belonged to B-proteobacteria class. These results demonstrated the crucial role of the Firmicutes phylum in shaping the division between PDAC and controls. Species increased in the gut microbiomes of PDAC included Fusobacterium hwasookii nucleatum, Alloscardovia omnicolens, Veillonella spp. (Veillonella atypica and Veillonella parvula) and several unknown species in the phylum Firmicutes, while species depleted included several from the order Clostridiales, Bacteroides coprocola, Faecalibacterium prausnitzii, Bifidobacterium bifidum, and unknown Bacteroidales. Of note, our results were consistent with previous studies¹⁶^,¹⁷^,¹⁸^,¹⁹^,⁴² for 27 out of the 42 species investigated.

Validation on an external cohort and comparison with previous studies

To evaluate the specificity of the trained models for PDAC, we assessed the accuracy of predictions using a dataset from a German study.¹⁷ This dataset consisted of 44 PDAC patients and 32 controls, with detailed information on 14 exposures. On the validation population from Germany, the MOCO-GCN model demonstrated a performance of 0.89 ± 0.07 in terms of accuracy (ACC), 0.91 ± 0.04 in terms of F1 score, and 0.81 ± 0.19 in terms of area under the receiver operating characteristic curve (AUROC) (Figure 4B). To further validate the performance of our model, we collected studies conducted within the past five years¹⁶^,¹⁷^,¹⁸^,¹⁹^,⁴³ that investigated microbial prediction of pancreatic cancer. These studies primarily utilized traditional machine learning methods such as random forest and lasso regression. As illustrated in Figure 4E, our model exhibited superior predictive capabilities compared to previous studies by incorporating exposome data. These results collectively demonstrate the practicality and efficacy of MOCO-GCN in predicting pancreatic cancer by leveraging exposome and microbiome data.

Discussion

This study represents an advancement in our understanding of the complex relationship between PDAC, microbiome, and exposome. Our findings provide compelling evidence for the influential role of exposome in microbiome-related studies of pancreatic cancer. We not only accurately predicted PDAC with the combination of microbiome and exposome, but also yielded important insights into the species and exposure level associations between these factors. First, our model demonstrated superior performance compared to other methods and previous studies, achieving satisfactory results on an external cohort. This underscores the importance of comprehensively considering the microbiome and exposome in PDAC-related research. Second, we emphasize the pivotal role of exposome in the interplay between PDAC and microbiome, highlighting the need to account for the specificity and correlation between these factors in future studies. Third, we assert that pancreatic cancer should not be regarded only as a localized disease, but rather as a systemic, environmental, and microenvironmental disease. Taken together, this study represents an important contribution to our understanding of the factors that contribute to PDAC development and underscores the importance of incorporating microbiome and exposome data in future research and clinical practice.

Limitations of the study

This study provided valuable insights into the role of the microbiome and exposome in PDAC. However, it has several limitations that warrant acknowledgment. First, the small sample size and the challenging nature of collecting comprehensive gut microbiome and exposome data from a large population of pancreatic cancer patients highlight the need for more extensive longitudinal data to further elucidate the clinical translational and practical applications of this research. Second, the meta-variables data were limited to binary values, which restricted our ability to perform a more detailed analysis of factors such as alcohol consumption and smoking. Moreover, due to the lack of available data, we were unable to analyze two crucial risk factors for PDAC, exercise and diet. Third, despite the exceptional performance of our predictive model, the inherent complexity of deep learning models limits their interpretability, posing challenges in understanding the factors driving predictions. Therefore, more comprehensive follow-up research and analysis are required to clarify the mechanisms underlying PDAC as a microenvironmental and systemic disease. Addressing these limitations can provide a more comprehensive understanding of the factors contributing to PDAC development, informing the development of more effective prevention and treatment strategies. While our study primarily focuses on clinical aspects, we recognize the need for additional research to address challenges related to clinical costs and to advance practical applications in real-world medicine. In conclusion, our research provides initial methodologies and evidence for PDAC diagnosis based on microbiome and exposome data, but further research based on large-scale and more in-depth pancreatic cancer data are essential.

STAR★Methods

Key resources table

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Deposited data

Raw sequencing data	Kartal et al.¹⁷	European Nucleotide Archive (ENA). Dataset identifiers: PRJEB38625; PRJEB42013

Software and algorithms

Python (version 3.7.15)	Python Software Foundation	https://www.python.org/
Pandas (version 1.2.4)	Python package	RRID: SCR_018214; https://pandas.pydata.org/
PyTorch (version 1.11.0)	Python package	RRID: SCR_018536; https://pytorch.org/
scikit-learn (version 1.0.2)	Python package	RRID: SCR_002577; http://scikit-learn.org/
R (version 4.2.1)	R software	http://www.R-project.org
MOGONET	Wang et al.⁴¹	https://github.com/txWang/MOGONET
MOCO-GCN	This paper	https://github.com/Yuli-SDU/MOCO-GCN

Open in a new tab

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Bingqiang Liu (e-mail: bingqiang@sdu.edu.cn).

Materials availability

This study did not generate new unique reagents.

Data and code availability

•
All the raw data enrolled in this study have been deposited to the European Nucleotide Archive (ENA). The accession number is listed in the key resources table.
•
All source code was available at https://github.com/Yuli-SDU/MOCO-GCN.
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Method details

MOCO-GCN

Microbiome and exposome integration strategies are needed to combine the complementary knowledge brought by each omics layer. The early integration used traditional machine learning methods based on the concatenation of every dataset into a single large matrix.⁴⁴^,⁴⁵^,⁴⁶ This ignores the specific data distribution of each omic, which can potentially misguide ML models into finding irrelevant patterns that simply reflect the features’ membership to the same omics. To utilize the correlations across different classes and different omics data types to further boost the performance, we introduced MOCO-GCN, a framework for classification tasks with multi-omics data. The model is mainly composed of a Two-view Co-training Graph Convolutional Networks (GCNs) module for learning microbiome and exposome data features and improving the generalization ability of the GCN through the cooperation among multiple learners, and a View Correlation Discovery Network (VCDN)⁴⁰ module for multi-omics data integration. The detailed architecture of the model for predicting pancreatic cancer in this study is shown in Figure 4A. First, for exposome and microbiome data, we constructed ${G C N}_{s p e c i e s}$ and ${G C N}_{e x p o s o m e}$ based on weighted sample similarity networks using cosine similarity, to train specific omics data. Then, we used co-training to train GCN on two different views and exchange labels of unlabeled instances in an iterative way. Consequently, the initial predictions of class labels were generated by the co-training of ${G C N}_{s p e c i e s}$ and ${G C N}_{e x p o s o m e}$ . Secondly, we constructed the cross-omics discovery tensor which integrated the ${G C N}_{s p e c i e s}$ and ${G C N}_{e x p o s o m e}$ label correlations. And after reshaping into a vector, it was forwarded to VCDN for final label prediction. MOCO-GCN is an end-to-end model, and every module is trained alternatively until convergence.

Two-view Co-training GCNs

First, we used cosine similarity for exposome and microbiome data respectively to construct sample similarity networks. The cosine similarity calculation formula is as follows:

s i m i l a r i t y = \cos (θ) = \frac{A \cdot B}{‖ A ‖ ‖ B ‖} = \frac{\sum_{i = 1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i = 1}^{n} {(A_{i})}^{2}} \times \sqrt{\sum_{i = 1}^{n} {(B_{i})}^{2}}}

(Equation 1)

Where $A$ and $B$ are the two attribute vectors of the sample about the microbiome and the exposome. And each participant is a node in the sample similarity network.

Further, we constructed the feature matrix and the graph structure as the inputs in GCN model. Of these, the feature matrix is defined as $X ϵ R^{n \times d}$ , where $n$ is the number of nodes and $d$ is the number of features. And the graph structure is expressed in the form of adjacency matrix $A ϵ R^{n \times n}$ , where $A$ is constructed by calculating the cosine similarity between nodes:

A_{i j} = {\begin{array}{c} s (x_{i}, x_{j}), i f i \neq j a n d s (x_{i}, x_{j}) \geq ε \\ 0, o t h e r w i s e \end{array}

(Equation 2)

Where $A_{i j}$ represents the adjacency between node $i$ and node $j$ , $x_{i}$ and $x_{j}$ are the feature vectors of node $i$ and node $j$ . $ε$ is the threshold determined by $k$ , which is the average number of edges retained per node:

k = \sum_{i, j} I (s (x_{i}, x_{j}) \geq ε) / n

(Equation 3)

Where $I (\cdot)$ is the indicator function and note that for the parameter $k = 1$ , $A$ would contain no edge and a GCN would degenerate to a normal neural network (NN). A proper $k$ value is important for the performance of MOCO-GCN. Figure 4C shows the performance of MOCO-GCN when $k$ varies from 2 to 10.

Finally, we built GCN by stacking multiple convolutional layers, each layer is defined as:

H^{(l + 1)} = f (H^{(l)}, A) = σ ({\hat{D}}^{- \frac{1}{2}} (A + I) {\hat{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)})

(Equation 4)

Where $l$ is the number of layers in the convolutional layer of the graph, $H^{(l)}$ is the input of layer $l t h$ , $W^{(l)}$ is the weight matrix of layer $l th$ , $H^{(l + 1)}$ is the output of layer $l th$ , $\hat{D}$ is the diagonal node degree matrix of $A$ and $I$ is the identity matrix. $σ (\cdot)$ represents a non-linear activation function.

To overcome the limits of the GCN model with shallow architectures, we introduce co-training and self-training to train GCNs. As a method of multi-view learning, co-training explores the effective information in unlabeled data and improves the generalization ability of the model through cooperation among multiple learners.⁴⁷ Given the labeled data $y_{i} (i = 1, 2, 3, . . .)$ and the unlabeled data $y_{j} (j = 1, 2, 3, . . .)$ , the predictions $y_{i}^{s p e c i e s}$ , $y_{i}^{e x p o s o m e}$ under two different views could be obtained by different learners ${G C N}_{s p e c i e s}$ and ${G C N}_{e x p o s o m e}$ . The classifiers are used to estimate the label confidence of the unlabeled samples, and the trusted samples are added to the training dataset for iterative training to optimize the classifiers. This allows them to distill knowledge from each other by adding their most confident unlabeled data into the training set. Once all the unlabeled samples are labeled with each trained model and ${G C N}_{s p e c i e s}$ and ${G C N}_{e x p o s o m e}$ become stable, the training ends.

VCDN for label integration

VCDN is proposed to fully explore the cross-view label correlations and improve model performance. After the initial classification results $y_{i}^{s p e c i e s} ϵ R^{2}$ and $y_{i}^{e x p o s o m e} ϵ R^{2}$ are achieved by the co-training of ${G C N}_{s p e c i e s}$ and ${G C N}_{e x p o s o m e}$ , the cross-view label-level adjacency matrix $c_{i} \in R^{2 \times 2}$ is defined as:

c_{i} = y_{i}^{exposome} \cdot y_{i}^{specie s^{T}}

(Equation 5)

Then, $c_{i}$ is reshaped to a $2^{2} - d i m e n s i o n a l$ vector and forwarded to $C_{V C D N} (\cdot)$ to predict the final prediction, notice that $C_{V C D N} (\cdot)$ is denoted as a fully connected network with an output dimension of $2$ .⁴⁰

The loss function of MOCO-GCN is defined as:

L = L_{G C N}^{species} + L_{G C N}^{exposome} + L_{V C D N}

(Equation 6)

Of these, $L_{G C N}$ could be written as:

L_{G C N}^{j} = \sum_{i = 1}^{n} L_{C E} ({\hat{y}}_{i}^{j}, y_{i}) = \sum_{i = 1}^{n} - \log \frac{e^{{\hat{y}}_{i}^{j}, y_{i}}}{\sum_{k} e^{{\hat{y}}_{i, k}^{j}}}

(Equation 7)

Where $y_{i} ϵ R^{n}$ is the ground-truth label vector of $i t h$ sample and $L_{C E} (\cdot)$ is the cross-entropy loss function. ${\hat{y}}_{i}^{j}$ represents the predicted label probability of $i t h$ training sample, notice that $j \in (s p e c i e s, e x p o s o m e)$ . And $L_{V C D N}$ could be written as:

L_{V C D N} = \sum_{i = 1}^{n} L_{C E} (V C D N (c_{i}), y_{i})

(Equation 8)

We fix $V C D N (\cdot)$ and update ${G C N}_{i} (\cdot)$ , $i =$ 1, 2, 3, 4 for microbiome and exposome data to minimize the loss function $L$ .

At last, to verify the performance of MOCO-GCN, we used the 4-fold cross-validation method to compare the model with the traditional machine learning models, such as Support vector machine classifier (SVM), Linear regression trained with L2 regularization (Lasso),¹⁷ Random Forest classifier (RF),¹⁶ Gradient tree boosting-based classifier (XGBoost), and other multi-omics classification methods: MOGONET (Multi-Omics Graph Convolutional NETworks);⁴¹ NN_VCDN (fully connected NN with the same layers as the GCN in MOGONET). The evaluation index consists of accuracy (ACC), F1score (F1), and AUROC. We reported the mean and standard deviation results of the 4-fold cross-validation. All above framework was implemented in Python 3.10 with numpy, pandas, matplotlib, scikit-learn package, KFold, XGBClassifier, LogisticRegression, RandomForestClassifier, SVC, and torch.

Identifying biomarkers with MOCO-GCN

To determine the most important microbiome and exposures in the process of diagnosing PDAC, we employed a feature ablation method⁴⁸ to calculate feature importance. Using the change in F1 score as the metric, we assigned each feature to zero and calculated the decrease in MOCO-GCN performance on the test set compared to using all the features. Features that exhibited the greatest performance drop were considered to be the most important ones. In essence, the measurement involved comparing the difference in model performance after excluding each feature with the original performance.

Quantification and statistical analysis

Alpha and beta diversity analysis

The Shannon diversity⁴⁹ was used as a measure of within-individual diversity, and this was calculated using the R function “diversity”. The beta diversity⁵⁰ was used as a measure of between-sample differences in community composition and was quantified as Bray-Curtis dissimilarity. The principal coordinate analysis (PCoA) was used to sort and visualize the beta diversity matrices. Differences in alpha diversity between groups were examined with Wilcoxon rank-sum test (Figure 2B). Differences in beta diversity between groups were tested using the ‘adonis2’ implementation of permutational multivariate analysis of variance (PERMANOVA) (Figure 2A). The R packages used for calculation and visualization for all analyses included: vegan, ggplot2, ade4, RcolorBrewer, dplyr, and reshape2.

Difference abundance analysis

The difference abundance analyses in PDAC cases and controls were implemented by Wilcoxon rank-sum test with the R function “wilcox.test”. And we used the generalized fold change (GFOLD) to figure out the microbial abundance fold change between PDAC cases and controls. The value of generalized fold change greater than 0 indicates enrichment in PDAC, while less than 0 indicates enrichment in controls.

Machine learning

To identify the microbiota-associated variable and find the microbiota community difference between PDAC patients and controls by machine-learning analyses, we constructed a random forest classifier of 25-repeat stratified fourfold cross-validation which leveraged bootstrapping and cross-validation to identify groups. The general pipeline is shown in Figure 3A. The training and testing occurred on separate, randomly selected, stratified sampling splits of 75% and 25% of the data. It was set to have 512 decision trees and one sample for each leaf so that the problem of overfitting on the training set would be eliminated which applied to few samples relative to the number of ASVs. The mean-AUROC value was from the average of 100 repeats. All above framework was implemented in Python 3.10 with numpy, pandas, matplotlib, and scikit-learn package.

Bi-directional mediation analysis

To revealed putative causal relationships between PDAC, the microbiome and exposome, we implemented a bi-directional mediation analysis. This analytical approach allowed us to investigate the mediating effects of the microbiome and exposome on the development and progression of PDAC. We focused on 23 exposures and 9 species that exhibited significant associations with PDAC (FDR_Spearman < 0.05). The FDR was calculated using the Benjamini–Hochberg procedure. we conducted a bi-directional mediation analysis with interactions, employing the equation y = x $+$ m $+$ x $\times$ m, where y represents the outcome, x represents the independent variable, and m represents the mediator. All of these analyses were performed using the mediation package in R.

Acknowledgments

This work was supported by National Key R&D Program of China (2020YFA0712400), National Nature Science Foundation of China (NSFC, 62272270 and 11931008), and Shandong University multi disciplinary research and innovation team of young scholars (2020QNQT017).

Author contributions

B.L. and K.N. conceived of and proposed the idea and designed the study. Y.Z. and H.Z. performed the experiments, analyzed the data, and contributed to editing and proofreading the manuscript. All authors read and approved the final manuscript.

Declaration of interests

The authors declare that they have no competing interests.

Published: February 21, 2024

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2024.109294.

Contributor Information

Bingqiang Liu, Email: bingqiang@sdu.edu.cn.

Kang Ning, Email: ningkang@hust.edu.cn.

Supplemental information

Document S1. Figures S1–S3

mmc1.pdf^{(553.7KB, pdf)}

Table S1. Spearman Rank Correlation Results, related to Figure 3

mmc2.xlsx^{(16.7KB, xlsx)}

Table S2. The results of differential abundant test between PDAC and controls, related to Figure 1

mmc3.xlsx^{(13.1KB, xlsx)}

Table S3. Detailed interpretation of exposome data, related to Figure 2

mmc4.xlsx^{(15.8KB, xlsx)}

Table S4. The IDs of matched and unmatched cohort, related to Figure 1

mmc5.xlsx^{(16KB, xlsx)}

References

1.Vincent A., Herman J., Schulick R., Hruban R.H., Goggins M. Pancreatic cancer. Lancet. 2011;378:607–620. doi: 10.1016/S0140-6736(10)62307-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Kamisawa T., Wood L.D., Itoi T., Takaori K. Pancreatic cancer. Lancet. 2016;388:73–85. doi: 10.1016/S0140-6736(16)00141-0. [DOI] [PubMed] [Google Scholar]
3.Ilic M., Ilic I. Epidemiology of pancreatic cancer. World J. Gastroenterol. 2016;22:9694–9705. doi: 10.3748/wjg.v22.i44.9694. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.McGuigan A., Kelly P., Turkington R.C., Jones C., Coleman H.G., McCain R.S. Pancreatic cancer: A review of clinical diagnosis, epidemiology, treatment and outcomes. World J. Gastroenterol. 2018;24:4846–4861. doi: 10.3748/wjg.v24.i43.4846. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Ghaddar B., Biswas A., Harris C., Omary M.B., Carpizo D.R., Blaser M.J., De S. Tumor microbiome links cellular programs and immunity in pancreatic cancer. Cancer Cell. 2022;40:1240–1253.e5. doi: 10.1016/j.ccell.2022.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Aykut B., Pushalkar S., Chen R., Li Q., Abengozar R., Kim J.I., Shadaloey S.A., Wu D., Preiss P., Verma N., et al. The fungal mycobiome promotes pancreatic oncogenesis via activation of MBL. Nature. 2019;574:264–267. doi: 10.1038/s41586-019-1608-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Poore G.D., Kopylova E., Zhu Q., Carpenter C., Fraraccio S., Wandro S., Kosciolek T., Janssen S., Metcalf J., Song S.J., et al. Microbiome analyses of blood and tissues suggest cancer diagnostic approach. Nature. 2020;579:567–574. doi: 10.1038/s41586-020-2095-1. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
8.Wang Y., Li Z., Zheng S., Zhou Y., Zhao L., Ye H., Zhao X., Gao W., Fu Z., Zhou Q., et al. Expression profile of long non-coding RNAs in pancreatic cancer and their clinical significance as biomarkers. Oncotarget. 2015;6:35684–35698. doi: 10.18632/oncotarget.5533. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Blyuss O., Zaikin A., Cherepanova V., Munblit D., Kiseleva E.M., Prytomanova O.M., Duffy S.W., Crnogorac-Jurcevic T. Development of PancRISK, a urine biomarker-based risk score for stratified screening of pancreatic cancer patients. Br. J. Cancer. 2020;122:692–696. doi: 10.1038/s41416-019-0694-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Herreros-Villanueva M., Bujanda L. Glypican-1 in exosomes as biomarker for early detection of pancreatic cancer. Ann. Transl. Med. 2016;4:64. doi: 10.3978/j.issn.2305-5839.2015.10.39. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Goonetilleke K.S., Siriwardena A.K. Systematic review of carbohydrate antigen (CA 19-9) as a biochemical marker in the diagnosis of pancreatic cancer. Eur. J. Surg. Oncol. 2007;33:266–270. doi: 10.1016/j.ejso.2006.10.004. [DOI] [PubMed] [Google Scholar]
12.Xing H., Wang J., Wang Y., Tong M., Hu H., Huang C., Li D. Diagnostic Value of CA 19-9 and Carcinoembryonic Antigen for Pancreatic Cancer: A Meta-Analysis. Gastroenterol. Res. Pract. 2018;2018 doi: 10.1155/2018/8704751. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Farrell J.J., Zhang L., Zhou H., Chia D., Elashoff D., Akin D., Paster B.J., Joshipura K., Wong D.T.W. Variations of oral microbiota are associated with pancreatic diseases including pancreatic cancer. Gut. 2012;61:582–588. doi: 10.1136/gutjnl-2011-300784. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Michaud D.S., Izard J., Wilhelm-Benartzi C.S., You D.H., Grote V.A., Tjønneland A., Dahm C.C., Overvad K., Jenab M., Fedirko V., et al. Plasma antibodies to oral bacteria and risk of pancreatic cancer in a large European prospective cohort study. Gut. 2013;62:1764–1770. doi: 10.1136/gutjnl-2012-303006. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Olson S.H., Satagopan J., Xu Y., Ling L., Leong S., Orlow I., Saldia A., Li P., Nunes P., Madonia V., et al. The oral microbiota in patients with pancreatic cancer, patients with IPMNs, and controls: a pilot study. Cancer Causes Control. 2017;28:959–969. doi: 10.1007/s10552-017-0933-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Nagata N., Nishijima S., Kojima Y., Hisada Y., Imbe K., Miyoshi-Akiyama T., Suda W., Kimura M., Aoki R., Sekine K., et al. Metagenomic Identification of Microbial Signatures Predicting Pancreatic Cancer From a Multinational Study. Gastroenterology. 2022;163:222–238. doi: 10.1053/j.gastro.2022.03.054. [DOI] [PubMed] [Google Scholar]
17.Kartal E., Schmidt T.S.B., Molina-Montes E., Rodríguez-Perales S., Wirbel J., Maistrenko O.M., Akanni W.A., Alashkar Alhamwe B., Alves R.J., Carrato A., et al. A faecal microbiota signature with high specificity for pancreatic cancer. Gut. 2022;71:1359–1372. doi: 10.1136/gutjnl-2021-324755. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Half E., Keren N., Reshef L., Dorfman T., Lachter I., Kluger Y., Reshef N., Knobler H., Maor Y., Stein A., et al. Fecal microbiome signatures of pancreatic cancer patients. Sci. Rep. 2019;9 doi: 10.1038/s41598-019-53041-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Ren Z., Jiang J., Xie H., Li A., Lu H., Xu S., Zhou L., Zhang H., Cui G., Chen X., et al. Gut microbial profile analysis by MiSeq sequencing of pancreatic carcinoma patients in China. Oncotarget. 2017;8:95176–95191. doi: 10.18632/oncotarget.18820. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Inamura K., Hamada T., Bullman S., Ugai T., Yachida S., Ogino S. Cancer as microenvironmental, systemic and environmental diseases: opportunity for transdisciplinary microbiomics science. Gut. 2022;71:2107–2122. doi: 10.1136/gutjnl-2022-327209. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Samokhvalov A.V., Rehm J., Roerecke M. Alcohol Consumption as a Risk Factor for Acute and Chronic Pancreatitis: A Systematic Review and a Series of Meta-analyses. EBioMedicine. 2015;2:1996–2002. doi: 10.1016/j.ebiom.2015.11.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Midha S., Chawla S., Garg P.K. Modifiable and non-modifiable risk factors for pancreatic cancer: A review. Cancer Lett. 2016;381:269–277. doi: 10.1016/j.canlet.2016.07.022. [DOI] [PubMed] [Google Scholar]
23.Wood H.E., Gupta S., Kang J.Y., Quinn M.J., Maxwell J.D., Mudan S., Majeed A. Pancreatic cancer in England and Wales 1975-2000: patterns and trends in incidence, survival and mortality. Aliment. Pharmacol. Ther. 2006;23:1205–1214. doi: 10.1111/j.1365-2036.2006.02860.x. [DOI] [PubMed] [Google Scholar]
24.Iodice S., Gandini S., Maisonneuve P., Lowenfels A.B. Tobacco and the risk of pancreatic cancer: a review and meta-analysis. Langenbeck's Arch. Surg. 2008;393:535–545. doi: 10.1007/s00423-007-0266-2. [DOI] [PubMed] [Google Scholar]
25.Raimondi S., Maisonneuve P., Lowenfels A.B. Epidemiology of pancreatic cancer: an overview. Nat. Rev. Gastroenterol. Hepatol. 2009;6:699–708. doi: 10.1038/nrgastro.2009.177. [DOI] [PubMed] [Google Scholar]
26.Klein A.P., Brune K.A., Petersen G.M., Goggins M., Tersmette A.C., Offerhaus G.J.A., Griffin C., Cameron J.L., Yeo C.J., Kern S., Hruban R.H. Prospective risk of pancreatic cancer in familial pancreatic cancer kindreds. Cancer Res. 2004;64:2634–2638. doi: 10.1158/0008-5472.can-03-3823. [DOI] [PubMed] [Google Scholar]
27.Bosetti C., Rosato V., Li D., Silverman D., Petersen G.M., Bracci P.M., Neale R.E., Muscat J., Anderson K., Gallinger S., et al. Diabetes, antidiabetic medications, and pancreatic cancer risk: an analysis from the International Pancreatic Cancer Case-Control Consortium. Ann. Oncol. 2014;25:2065–2072. doi: 10.1093/annonc/mdu276. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Sah R.P., Nagpal S.J.S., Mukhopadhyay D., Chari S.T. New insights into pancreatic cancer-induced paraneoplastic diabetes. Nat. Rev. Gastroenterol. Hepatol. 2013;10:423–433. doi: 10.1038/nrgastro.2013.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Arslan A.A., Helzlsouer K.J., Kooperberg C., Shu X.O., Steplowski E., Bueno-de-Mesquita H.B., Fuchs C.S., Gross M.D., Jacobs E.J., Lacroix A.Z., et al. Anthropometric measures, body mass index, and pancreatic cancer: a pooled analysis from the Pancreatic Cancer Cohort Consortium (PanScan) Arch. Intern. Med. 2010;170:791–802. doi: 10.1001/archinternmed.2010.63. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Vamanu E., Rai S.N. The Link between Obesity, Microbiota Dysbiosis, and Neurodegenerative Pathogenesis. Diseases. 2021;9:45. doi: 10.3390/diseases9030045. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Shoelson S.E., Lee J., Yuan M. Inflammation and the IKKβ/IκB/NF-κB axis in obesity- and diet-induced insulin resistance. Int. J. Obes. Relat. Metab. Disord. 2003;27:S49–S52. doi: 10.1038/sj.ijo.0802501. [DOI] [PubMed] [Google Scholar]
32.Engen P.A., Green S.J., Voigt R.M., Forsyth C.B., Keshavarzian A. The Gastrointestinal Microbiome: Alcohol Effects on the Composition of Intestinal Microbiota. Alcohol Res. 2015;37:223–236. doi: 10.35946/arcr.v37.2.07. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Dziewiecka H., Buttar H.S., Kasperska A., Ostapiuk–Karolczuk J., Domagalska M., Cichoń J., Skarpańska-Stejnborn A. Physical activity induced alterations of gut microbiota in humans: a systematic review. BMC Sports Sci. Med. Rehabil. 2022;14:122. doi: 10.1186/s13102-022-00513-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Yadav D., Lowenfels A.B. The Epidemiology of Pancreatitis and Pancreatic Cancer. Gastroenterology. 2013;144:1252–1261. doi: 10.1053/j.gastro.2013.01.068. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Stekhoven D.J., Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28:112–118. doi: 10.1093/bioinformatics/btr597. [DOI] [PubMed] [Google Scholar]
36.Guo P., Zhang K., Ma X., He P. Clostridium species as probiotics: potentials and challenges. J. Anim. Sci. Biotechnol. 2020;11:24. doi: 10.1186/s40104-019-0402-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Scher J.U., Abramson S.B. The microbiome and rheumatoid arthritis. Nat. Rev. Rheumatol. 2011;7:569–578. doi: 10.1038/nrrheum.2011.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Ponath F., Tawk C., Zhu Y., Barquist L., Faber F., Vogel J. RNA landscape of the emerging cancer-associated microbe Fusobacterium nucleatum. Nat. Microbiol. 2021;6:1007–1020. doi: 10.1038/s41564-021-00927-7. [DOI] [PubMed] [Google Scholar]
39.Alkhayyat M., Abou Saleh M., Grewal M.K., Abureesh M., Mansoor E., Simons-Linares C.R., Abelson A., Chahal P. Pancreatic manifestations in rheumatoid arthritis: a national population-based study. Rheumatology. 2021;60:2366–2374. doi: 10.1093/rheumatology/keaa616. [DOI] [PubMed] [Google Scholar]
40.Wang L., Ding Z., Tao Z., Liu Y., Fu Y. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South) 2019. Generative Multi-View Human Action Recognition; pp. 6211–6220. [Google Scholar]
41.Wang T., Shao W., Huang Z., Tang H., Zhang J., Ding Z., Huang K. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 2021;12:3445. doi: 10.1038/s41467-021-23774-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Cevirgen E. 2022. Exploring the Potential of the Human Microbiome for Colorectal and Pancreatic Cancer Screening. [Google Scholar]
43.Lu H., Ren Z., Li A., Li J., Xu S., Zhang H., Jiang J., Yang J., Luo Q., Zhou K., et al. Tongue coating microbiome data distinguish patients with pancreatic head cancer from healthy controls. J. Oral Microbiol. 2019;11 doi: 10.1080/20002297.2018.1563409. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Sun Y., Goodison S., Li J., Liu L., Farmerie W. Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics. 2007;23:30–37. doi: 10.1093/bioinformatics/btl543. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Spicker J.S., Brunak S., Frederiksen K.S., Toft H. Integration of clinical chemistry, expression, and metabolite data leads to better toxicological class separation. Toxicol. Sci. 2008;102:444–454. doi: 10.1093/toxsci/kfn001. [DOI] [PubMed] [Google Scholar]
46.Abdi H., Williams L.J., Valentin D. Multiple factor analysis: principal component analysis for multitable and multiblock data sets. WIREs Comput. Stats. 2013;5:149–179. [Google Scholar]
47.Ning X., Wang X., Xu S., Cai W., Liping Z., Yu L., Li W. A review of research on co-training. Concurrency Comput. Pract. Ex. 2021;35:e6276. doi: 10.1002/cpe.6276. [DOI] [Google Scholar]
48.Huang Z., Zhan X., Xiang S., Johnson T.S., Helm B., Yu C.Y., Zhang J., Salama P., Rizkalla M., Han Z., Huang K. SALMON: Survival Analysis Learning With Multi-Omics Neural Networks on Breast Cancer. Front. Genet. 2019;10 doi: 10.3389/fgene.2019.00166. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Shannon C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948;27:379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x. [DOI] [Google Scholar]
50.Whittaker R.H. Evolution and measurement of species diversity. Taxon. 1972;21:213–251. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S3

mmc1.pdf^{(553.7KB, pdf)}

Table S1. Spearman Rank Correlation Results, related to Figure 3

mmc2.xlsx^{(16.7KB, xlsx)}

Table S2. The results of differential abundant test between PDAC and controls, related to Figure 1

mmc3.xlsx^{(13.1KB, xlsx)}

Table S3. Detailed interpretation of exposome data, related to Figure 2

mmc4.xlsx^{(15.8KB, xlsx)}

Table S4. The IDs of matched and unmatched cohort, related to Figure 1

mmc5.xlsx^{(16KB, xlsx)}

Data Availability Statement

•
All the raw data enrolled in this study have been deposited to the European Nucleotide Archive (ENA). The accession number is listed in the key resources table.
•
All source code was available at https://github.com/Yuli-SDU/MOCO-GCN.
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

[bib1] 1.Vincent A., Herman J., Schulick R., Hruban R.H., Goggins M. Pancreatic cancer. Lancet. 2011;378:607–620. doi: 10.1016/S0140-6736(10)62307-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Kamisawa T., Wood L.D., Itoi T., Takaori K. Pancreatic cancer. Lancet. 2016;388:73–85. doi: 10.1016/S0140-6736(16)00141-0. [DOI] [PubMed] [Google Scholar]

[bib3] 3.Ilic M., Ilic I. Epidemiology of pancreatic cancer. World J. Gastroenterol. 2016;22:9694–9705. doi: 10.3748/wjg.v22.i44.9694. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.McGuigan A., Kelly P., Turkington R.C., Jones C., Coleman H.G., McCain R.S. Pancreatic cancer: A review of clinical diagnosis, epidemiology, treatment and outcomes. World J. Gastroenterol. 2018;24:4846–4861. doi: 10.3748/wjg.v24.i43.4846. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Ghaddar B., Biswas A., Harris C., Omary M.B., Carpizo D.R., Blaser M.J., De S. Tumor microbiome links cellular programs and immunity in pancreatic cancer. Cancer Cell. 2022;40:1240–1253.e5. doi: 10.1016/j.ccell.2022.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Aykut B., Pushalkar S., Chen R., Li Q., Abengozar R., Kim J.I., Shadaloey S.A., Wu D., Preiss P., Verma N., et al. The fungal mycobiome promotes pancreatic oncogenesis via activation of MBL. Nature. 2019;574:264–267. doi: 10.1038/s41586-019-1608-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Poore G.D., Kopylova E., Zhu Q., Carpenter C., Fraraccio S., Wandro S., Kosciolek T., Janssen S., Metcalf J., Song S.J., et al. Microbiome analyses of blood and tissues suggest cancer diagnostic approach. Nature. 2020;579:567–574. doi: 10.1038/s41586-020-2095-1. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[bib8] 8.Wang Y., Li Z., Zheng S., Zhou Y., Zhao L., Ye H., Zhao X., Gao W., Fu Z., Zhou Q., et al. Expression profile of long non-coding RNAs in pancreatic cancer and their clinical significance as biomarkers. Oncotarget. 2015;6:35684–35698. doi: 10.18632/oncotarget.5533. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Blyuss O., Zaikin A., Cherepanova V., Munblit D., Kiseleva E.M., Prytomanova O.M., Duffy S.W., Crnogorac-Jurcevic T. Development of PancRISK, a urine biomarker-based risk score for stratified screening of pancreatic cancer patients. Br. J. Cancer. 2020;122:692–696. doi: 10.1038/s41416-019-0694-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Herreros-Villanueva M., Bujanda L. Glypican-1 in exosomes as biomarker for early detection of pancreatic cancer. Ann. Transl. Med. 2016;4:64. doi: 10.3978/j.issn.2305-5839.2015.10.39. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Goonetilleke K.S., Siriwardena A.K. Systematic review of carbohydrate antigen (CA 19-9) as a biochemical marker in the diagnosis of pancreatic cancer. Eur. J. Surg. Oncol. 2007;33:266–270. doi: 10.1016/j.ejso.2006.10.004. [DOI] [PubMed] [Google Scholar]

[bib12] 12.Xing H., Wang J., Wang Y., Tong M., Hu H., Huang C., Li D. Diagnostic Value of CA 19-9 and Carcinoembryonic Antigen for Pancreatic Cancer: A Meta-Analysis. Gastroenterol. Res. Pract. 2018;2018 doi: 10.1155/2018/8704751. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Farrell J.J., Zhang L., Zhou H., Chia D., Elashoff D., Akin D., Paster B.J., Joshipura K., Wong D.T.W. Variations of oral microbiota are associated with pancreatic diseases including pancreatic cancer. Gut. 2012;61:582–588. doi: 10.1136/gutjnl-2011-300784. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Michaud D.S., Izard J., Wilhelm-Benartzi C.S., You D.H., Grote V.A., Tjønneland A., Dahm C.C., Overvad K., Jenab M., Fedirko V., et al. Plasma antibodies to oral bacteria and risk of pancreatic cancer in a large European prospective cohort study. Gut. 2013;62:1764–1770. doi: 10.1136/gutjnl-2012-303006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Olson S.H., Satagopan J., Xu Y., Ling L., Leong S., Orlow I., Saldia A., Li P., Nunes P., Madonia V., et al. The oral microbiota in patients with pancreatic cancer, patients with IPMNs, and controls: a pilot study. Cancer Causes Control. 2017;28:959–969. doi: 10.1007/s10552-017-0933-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Nagata N., Nishijima S., Kojima Y., Hisada Y., Imbe K., Miyoshi-Akiyama T., Suda W., Kimura M., Aoki R., Sekine K., et al. Metagenomic Identification of Microbial Signatures Predicting Pancreatic Cancer From a Multinational Study. Gastroenterology. 2022;163:222–238. doi: 10.1053/j.gastro.2022.03.054. [DOI] [PubMed] [Google Scholar]

[bib17] 17.Kartal E., Schmidt T.S.B., Molina-Montes E., Rodríguez-Perales S., Wirbel J., Maistrenko O.M., Akanni W.A., Alashkar Alhamwe B., Alves R.J., Carrato A., et al. A faecal microbiota signature with high specificity for pancreatic cancer. Gut. 2022;71:1359–1372. doi: 10.1136/gutjnl-2021-324755. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Half E., Keren N., Reshef L., Dorfman T., Lachter I., Kluger Y., Reshef N., Knobler H., Maor Y., Stein A., et al. Fecal microbiome signatures of pancreatic cancer patients. Sci. Rep. 2019;9 doi: 10.1038/s41598-019-53041-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Ren Z., Jiang J., Xie H., Li A., Lu H., Xu S., Zhou L., Zhang H., Cui G., Chen X., et al. Gut microbial profile analysis by MiSeq sequencing of pancreatic carcinoma patients in China. Oncotarget. 2017;8:95176–95191. doi: 10.18632/oncotarget.18820. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Inamura K., Hamada T., Bullman S., Ugai T., Yachida S., Ogino S. Cancer as microenvironmental, systemic and environmental diseases: opportunity for transdisciplinary microbiomics science. Gut. 2022;71:2107–2122. doi: 10.1136/gutjnl-2022-327209. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Samokhvalov A.V., Rehm J., Roerecke M. Alcohol Consumption as a Risk Factor for Acute and Chronic Pancreatitis: A Systematic Review and a Series of Meta-analyses. EBioMedicine. 2015;2:1996–2002. doi: 10.1016/j.ebiom.2015.11.023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Midha S., Chawla S., Garg P.K. Modifiable and non-modifiable risk factors for pancreatic cancer: A review. Cancer Lett. 2016;381:269–277. doi: 10.1016/j.canlet.2016.07.022. [DOI] [PubMed] [Google Scholar]

[bib23] 23.Wood H.E., Gupta S., Kang J.Y., Quinn M.J., Maxwell J.D., Mudan S., Majeed A. Pancreatic cancer in England and Wales 1975-2000: patterns and trends in incidence, survival and mortality. Aliment. Pharmacol. Ther. 2006;23:1205–1214. doi: 10.1111/j.1365-2036.2006.02860.x. [DOI] [PubMed] [Google Scholar]

[bib24] 24.Iodice S., Gandini S., Maisonneuve P., Lowenfels A.B. Tobacco and the risk of pancreatic cancer: a review and meta-analysis. Langenbeck's Arch. Surg. 2008;393:535–545. doi: 10.1007/s00423-007-0266-2. [DOI] [PubMed] [Google Scholar]

[bib25] 25.Raimondi S., Maisonneuve P., Lowenfels A.B. Epidemiology of pancreatic cancer: an overview. Nat. Rev. Gastroenterol. Hepatol. 2009;6:699–708. doi: 10.1038/nrgastro.2009.177. [DOI] [PubMed] [Google Scholar]

[bib26] 26.Klein A.P., Brune K.A., Petersen G.M., Goggins M., Tersmette A.C., Offerhaus G.J.A., Griffin C., Cameron J.L., Yeo C.J., Kern S., Hruban R.H. Prospective risk of pancreatic cancer in familial pancreatic cancer kindreds. Cancer Res. 2004;64:2634–2638. doi: 10.1158/0008-5472.can-03-3823. [DOI] [PubMed] [Google Scholar]

[bib27] 27.Bosetti C., Rosato V., Li D., Silverman D., Petersen G.M., Bracci P.M., Neale R.E., Muscat J., Anderson K., Gallinger S., et al. Diabetes, antidiabetic medications, and pancreatic cancer risk: an analysis from the International Pancreatic Cancer Case-Control Consortium. Ann. Oncol. 2014;25:2065–2072. doi: 10.1093/annonc/mdu276. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Sah R.P., Nagpal S.J.S., Mukhopadhyay D., Chari S.T. New insights into pancreatic cancer-induced paraneoplastic diabetes. Nat. Rev. Gastroenterol. Hepatol. 2013;10:423–433. doi: 10.1038/nrgastro.2013.49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] 29.Arslan A.A., Helzlsouer K.J., Kooperberg C., Shu X.O., Steplowski E., Bueno-de-Mesquita H.B., Fuchs C.S., Gross M.D., Jacobs E.J., Lacroix A.Z., et al. Anthropometric measures, body mass index, and pancreatic cancer: a pooled analysis from the Pancreatic Cancer Cohort Consortium (PanScan) Arch. Intern. Med. 2010;170:791–802. doi: 10.1001/archinternmed.2010.63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] 30.Vamanu E., Rai S.N. The Link between Obesity, Microbiota Dysbiosis, and Neurodegenerative Pathogenesis. Diseases. 2021;9:45. doi: 10.3390/diseases9030045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 31.Shoelson S.E., Lee J., Yuan M. Inflammation and the IKKβ/IκB/NF-κB axis in obesity- and diet-induced insulin resistance. Int. J. Obes. Relat. Metab. Disord. 2003;27:S49–S52. doi: 10.1038/sj.ijo.0802501. [DOI] [PubMed] [Google Scholar]

[bib32] 32.Engen P.A., Green S.J., Voigt R.M., Forsyth C.B., Keshavarzian A. The Gastrointestinal Microbiome: Alcohol Effects on the Composition of Intestinal Microbiota. Alcohol Res. 2015;37:223–236. doi: 10.35946/arcr.v37.2.07. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Dziewiecka H., Buttar H.S., Kasperska A., Ostapiuk–Karolczuk J., Domagalska M., Cichoń J., Skarpańska-Stejnborn A. Physical activity induced alterations of gut microbiota in humans: a systematic review. BMC Sports Sci. Med. Rehabil. 2022;14:122. doi: 10.1186/s13102-022-00513-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] 34.Yadav D., Lowenfels A.B. The Epidemiology of Pancreatitis and Pancreatic Cancer. Gastroenterology. 2013;144:1252–1261. doi: 10.1053/j.gastro.2013.01.068. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 35.Stekhoven D.J., Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28:112–118. doi: 10.1093/bioinformatics/btr597. [DOI] [PubMed] [Google Scholar]

[bib36] 36.Guo P., Zhang K., Ma X., He P. Clostridium species as probiotics: potentials and challenges. J. Anim. Sci. Biotechnol. 2020;11:24. doi: 10.1186/s40104-019-0402-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37.Scher J.U., Abramson S.B. The microbiome and rheumatoid arthritis. Nat. Rev. Rheumatol. 2011;7:569–578. doi: 10.1038/nrrheum.2011.121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] 38.Ponath F., Tawk C., Zhu Y., Barquist L., Faber F., Vogel J. RNA landscape of the emerging cancer-associated microbe Fusobacterium nucleatum. Nat. Microbiol. 2021;6:1007–1020. doi: 10.1038/s41564-021-00927-7. [DOI] [PubMed] [Google Scholar]

[bib39] 39.Alkhayyat M., Abou Saleh M., Grewal M.K., Abureesh M., Mansoor E., Simons-Linares C.R., Abelson A., Chahal P. Pancreatic manifestations in rheumatoid arthritis: a national population-based study. Rheumatology. 2021;60:2366–2374. doi: 10.1093/rheumatology/keaa616. [DOI] [PubMed] [Google Scholar]

[bib40] 40.Wang L., Ding Z., Tao Z., Liu Y., Fu Y. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South) 2019. Generative Multi-View Human Action Recognition; pp. 6211–6220. [Google Scholar]

[bib41] 41.Wang T., Shao W., Huang Z., Tang H., Zhang J., Ding Z., Huang K. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 2021;12:3445. doi: 10.1038/s41467-021-23774-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] 42.Cevirgen E. 2022. Exploring the Potential of the Human Microbiome for Colorectal and Pancreatic Cancer Screening. [Google Scholar]

[bib43] 43.Lu H., Ren Z., Li A., Li J., Xu S., Zhang H., Jiang J., Yang J., Luo Q., Zhou K., et al. Tongue coating microbiome data distinguish patients with pancreatic head cancer from healthy controls. J. Oral Microbiol. 2019;11 doi: 10.1080/20002297.2018.1563409. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] 44.Sun Y., Goodison S., Li J., Liu L., Farmerie W. Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics. 2007;23:30–37. doi: 10.1093/bioinformatics/btl543. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] 45.Spicker J.S., Brunak S., Frederiksen K.S., Toft H. Integration of clinical chemistry, expression, and metabolite data leads to better toxicological class separation. Toxicol. Sci. 2008;102:444–454. doi: 10.1093/toxsci/kfn001. [DOI] [PubMed] [Google Scholar]

[bib46] 46.Abdi H., Williams L.J., Valentin D. Multiple factor analysis: principal component analysis for multitable and multiblock data sets. WIREs Comput. Stats. 2013;5:149–179. [Google Scholar]

[bib47] 47.Ning X., Wang X., Xu S., Cai W., Liping Z., Yu L., Li W. A review of research on co-training. Concurrency Comput. Pract. Ex. 2021;35:e6276. doi: 10.1002/cpe.6276. [DOI] [Google Scholar]

[bib48] 48.Huang Z., Zhan X., Xiang S., Johnson T.S., Helm B., Yu C.Y., Zhang J., Salama P., Rizkalla M., Han Z., Huang K. SALMON: Survival Analysis Learning With Multi-Omics Neural Networks on Breast Cancer. Front. Genet. 2019;10 doi: 10.3389/fgene.2019.00166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] 49.Shannon C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948;27:379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x. [DOI] [Google Scholar]

[bib50] 50.Whittaker R.H. Evolution and measurement of species diversity. Taxon. 1972;21:213–251. [Google Scholar]

PERMALINK

Highly accurate diagnosis of pancreatic cancer by integrative modeling using gut microbiome and exposome data

Yuli Zhang

Haohong Zhang

Bingqiang Liu

Kang Ning

Summary

Graphical abstract

Highlights

Introduction

Results

Data collection and preparation

Impact of exposome on microbial composition in pancreatic cancer patients

Figure 1.

Figure 2.

Exposome‒microbiome mediation effects in PDAC

Figure 3.

The exposome is used in concert with gut microbiome data to predict pancreatic cancer

Figure 4.

Validation on an external cohort and comparison with previous studies

Discussion

Limitations of the study

STAR★Methods

Key resources table

Resource availability

Lead contact

Materials availability

Data and code availability

Method details

MOCO-GCN

Two-view Co-training GCNs

VCDN for label integration

Identifying biomarkers with MOCO-GCN

Quantification and statistical analysis

Alpha and beta diversity analysis

Difference abundance analysis

Machine learning

Bi-directional mediation analysis

Acknowledgments

Author contributions

Declaration of interests

Footnotes

Contributor Information

Supplemental information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases