Non-invasive biomarkers for early diagnosis of pancreatic cancer risk: metabolite genomewide association study based on the KCPS-II cohort

Youngmin Han; Keum Ji Jung; Unchong Kim; Chan Il Jeon; Kwangbae Lee; Sun Ha Jee

doi:10.1186/s12967-023-04670-x

. 2023 Dec 4;21:878. doi: 10.1186/s12967-023-04670-x

Non-invasive biomarkers for early diagnosis of pancreatic cancer risk: metabolite genomewide association study based on the KCPS-II cohort

Youngmin Han ¹, Keum Ji Jung ¹, Unchong Kim ¹, Chan Il Jeon ¹, Kwangbae Lee ², Sun Ha Jee ^1,^✉

PMCID: PMC10694897 PMID: 38049855

Abstract

Background

Pancreatic cancer is a lethal disease with a high mortality rate. The difficulty of early diagnosis is one of its primary causes. Therefore, we aimed to discover non-invasive biomarkers that facilitate the early diagnosis of pancreatic cancer risk.

Methods

The study subjects were randomly selected from the Korean Cancer Prevention Study-II and matched by age, sex, and blood collection point [pancreatic cancer incidence (n = 128) vs. control (n = 256)]. The baseline serum samples were analyzed by non-targeted metabolomics, and XGBoost was used to select significant metabolites related to pancreatic cancer incidence. Genomewide association study for the selected metabolites discovered valuable single nucleotide polymorphisms (SNPs). Moderation and mediation analysis were conducted to explore the variables related to pancreatic cancer risk.

Results

Eleven discriminant metabolites were selected by applying a cut-off of 4.0 in XGBoost. Five SNP presented significance in metabolite-GWAS (p ≤ 5 × 10^–6) and logistic regression analysis. Among them, the pair metabolite of rs2370981, rs55870181, and rs72805402 displayed a different network pattern with clinical/biochemical indicators on comparison with allelic carrier and non-carrier. In addition, we demonstrated the indirect effect of rs59519100 on pancreatic cancer risk mediated by γ-glutamyl tyrosine, which affects the smoking status. The predictive ability for pancreatic cancer on the model using five SNPs and four pair metabolites with the conventional risk factors was the highest (AUC: 0.738 [0.661–0.815]).

Conclusions

Signatures involving metabolites and SNPs discovered in the present research may be closely associated with the pathogenesis of pancreatic cancer and for use as predictive biomarkers allowing early pancreatic cancer diagnosis and therapy.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12967-023-04670-x.

Keywords: Pancreatic cancer, Predictive biomarker, Genetic variants, LC/MS metabolomics, Metabolite genomewide association study

Introduction

The pancreas is an organ responsible for producing digestive juices and regulating the blood glucose levels. Pancreatic cancer is very lethal considering that early diagnosis is challenging and the chances of metastasis to the other organs are very high [1]. Pancreatic cancer accounts for approximately 3% of all cancers in the United States, and it is more common in men than in women [2]. According to the National Statistical Office of Korea, 6931 people (3600 men and 3331 women) died from pancreatic cancer, accounting for 8.4% of all cancer cases in 2021 [3].

The cause of pancreatic cancer is unclear, but smoking, being overweight, diabetes, and a relevant family history act as risk factors for pancreatic cancer. Smoking is a crucial risk factor for chronic pancreatic cancer [4]. In a study involving 2009 pancreatic cancer cases and 1532 control groups from the International Pancreatic Cancer Cohort, smokers showed a 1.72-fold higher risk of pancreatic cancer than the non-smokers. In addition, as per a report, the more the numbers of cigarettes smoked, the higher the risk of pancreatic cancer [5].

Recently, several studies were conducted on pancreatic cancer. Currently, the most widely used single tumor marker for pancreatic cancer is carbohydrate antigen (CA) 19–9, as noted in 80% of all pancreatic cancer patients. However, as its specificity is low for screening tests, it is usually used to determine the stage and prognosis of pancreatic cancer or to monitor its recurrence [6, 7]. In addition, Hwang et al. [8] suggested that the miR-21 expression is closely related to anticancer drug resistance; this aspect can be applied to predict anticancer drug resistance and the clinical outcomes for Korean pancreatic cancer patients. However, there are no biomarkers for the early diagnosis or early detection of pancreatic cancer risk yet.

Multi-omics is a method of comprehensively analyzing the data generated at various molecular levels, such as genome, transcriptome, proteome, and metabolome; it has been applied in multiple fields for disease research [9, 10]. This approach can provide systemic clues to understand the underlying metabolic changes occurring through the disease duration. Indeed, proteomics on genetically engineered mouse models with early and advanced stages of pancreatic cancer identified candidate proteome markers applicable to early detection [11]. Moreover, for ovarian cancer that was mainly diagnosed in the late stage, multi-omics technology has been widely used to discover several valuable biomarkers for the early diagnosis [12].

This study aims to discover non-invasive biomarkers for predicting pancreatic cancer risk through multi-omics technology. Genotyping and non-targeted screened metabolite data in the Korean subjects from the Korean Cancer Prevention Study (KCPS)-II were integratively analyzed through diverse statistical analyses. We expected that, our findings, including genomic and metabolomic biomarkers, can serve as the basis for research on pancreatic cancer pathogeneses.

Materials and methods

Study population

The study subjects were selected from the KCPS-II cohort. Briefly, the KCPS-II subjects were recruited through 18 health promotion centers across South Korea from April 2004. After their enrollment, hospital admission records, death registries, and National Cancer Center registry data were collected during the follow-up period. Written informed consent for cohort registration and secondary research was obtained from all cohort subjects, and their blood samples were collected.

For the current research, subjects aged 25–71 years were randomly selected from the KCPS-II. We comprised two groups by matching in a 1:2 ratio by age, sex, and the blood collection point [pancreatic cancer incidence group (n = 128) vs. control (n = 256)]. The subjects who were cancer-free at the time of enrollment, but later developed pancreatic cancer during the follow-up period were assigned to the case group.

All procedures in the current research involving human participants were performed in accordance with the ethical standards of the Institutional Review Board at the Yonsei University Health System under the Helsinki Declaration [IRB Number: 4-2022-1136].

Smoking history

Each participant answered a self-administered questionnaire concerning their smoking habits (never-smoker = 0, ex-smoker = 1, or a current smoker = 2). The smoking amount of current smokers was also investigated, but due to several missing values, this data was not used in this study.

Metabolome analysis

Non-targeted metabolomics

UHPLC-MS/MS analysis

The prepared serum samples were precipitated with cold acetonitrile (Wako Pure Chemical Industries, Osaka, Japan) (1:3, v/v) and centrifuged for 15 min (13,000 rpm, 4 ℃). The supernatant was then separated and dried in a vacuum concentrator (HyperVAC-MAX, Hanil Scientific Inc., Gimpo, Korea) without heating. Next, 200 μL of 10% methanol (J.T. Baker^® Chemicals; Avantor Performance Materials, Inc., Radnor, PA, USA) was added for reconstitution and filtrated through a 0.45-μm polyvinylidene difluoride syringe filter. L-Leucine-1-¹³C (Sigma-Aldrich, Saint Louis, MO, USA) was used as an internal standard (ISTD). The quality control (QC) sample was prepared following the exact step by combining all the serum samples.

The serum samples were injected into the Acquity UPLC-BEH-C18 column (Waters, Milford, MA, USA) connected to the Thermo UHPLC system (Ultimate 3000 BioRS; Dionex, Thermo Fisher Scientific, Bremen, Germany). The column temperature was maintained at 50 ℃. Two mobile phases [A, composed of 0.1% formic acid in LC–MS grade water (Thermo Fisher Scientific, Fair Lawn, NJ, USA); B, composed of 0.1% formic acid in LC–MS-grade methanol (Thermo Fisher Scientific, Fair Lawn, NJ, USA)] made gradient during 17 min for separating the compounds in the samples. Q Exactive Plus Orbitrap (Thermo Fisher Scientific, Waltham, MA, USA) was combined with the UHPLC system for data detection. On MS, positive electrospray ionization mode (ESI +) with 30 of collision energy, 3.5 kV of spray voltage, 60 (arbitrary units) of a flow rate of nitrogen sheath gas, and 20 (arbitrary units) of a flow rate of auxiliary gas was performed. Full scan-ddms² mode with a scan range of 80–1000 mass-to-charge (m/z) was used to collect data.

The QC samples were measured for every 10th prepared serum sample and monitored for sensitivity and reproducibility. In addition, the intra-assay and inter-assay variations were assessed using replicated results of QC samples for a few days.

Identification of metabolites

Compound Discoverer 3.2 software (Thermo Fisher Scientific, San Jose, CA, USA) was used for processing the raw spectra. Alignment and normalization were performed QCs in the program. Features detected < 80% in all QC samples were discarded. Processed features were identified with reference to online databases ChemSpider (http://www.chemspider.com), LIPID MAPS (https://www.lipidmaps.org), mzCloud (https://www.mzcloud.org), and Kyoto Encyclopedia of Genes and Genomes (KEGG; https://www.genome.jp/kegg).

Genotyping

DNA was genotyped using the KORV1.0–96 Array (Affymetrix, Santa Clara, CA, USA) provided by the K-CHIP consortium and Affymetrix Genomewide Human SNP Array 5.0 (Affymetrix Inc.). Markers with a high missing rate (> 5%), individuals with a high missing rate (> 5%), and SNPs with a minor allele frequency < 0.05 or in a significant deviation from the Hardy—Weinberg equilibrium (p < 1.0E − 6) were excluded for quality control.

Statistical analysis

All statistical analyses were conducted by SPSS 26 (IBM Corp, Armonk, NY, USA), R 4.1.3, and Python 3.9.12. We performed Independent t-tests and Mann–Whitney U-tests to evaluate the differences in the clinical/biochemical variables between the two groups. The skewed variables were logarithmically transformed. For nominal variables, a Chi-square test was applied. The data are expressed as the mean ± SE, and two-tailed p < 0.05 were considered to indicate statistical significance.

For multivariate analyses, the normalized metabolite data were exported from Compound Discoverer 3.2. After Pareto-scaling and logarithmically transforming, the eXtreme Gradient Boosting (XGBoost) model was fitted using Python. The log-loss function was applied as the target in a binary variable (control; 0, case; 1). To optimize the model hyperparameters, we limited the maximum depth of the trees and eta while increasing the n_estimators so as to help prevent overfitting; the model using a too-small weak learner (n_estimators) with deep tree may contain noise, and reducing the eta diminishes the contribution of each tree to the model. As a result, the XGBoost model was fitted with the following parameters to achieve a high AUC in the test set: n_estimators, 50; learning rate, 0.15, alpha, 0.001; max depth, 2; min child weigh, 5; and et, 0.1.

Metabolite-GWAS was performed using PLINK 2.0. Next, logistic regression analysis was performed after adjusting for age and sex to evaluate the association between the revealed significant SNPs and pancreatic cancer. The predictive ability for pancreatic cancer using the biomarkers discovered in this study was assessed through regression analysis. Furthermore, we confirmed whether the smoking status is a significant moderator of the association between metabolites (independent variable) and pancreatic cancer incidence (dependent variable) by using p-values from a coefficient of the interaction term (metabolites* smoking status). In addition, we conducted a mediation analysis to demonstrate a metabolite as a significant mediator of the association between smoking status (independent variable) and pancreatic cancer incidence (dependent variable) using the R mediate function in the mediation package. Python and R codes used in the current research were provided in Additional file 2: Data S1.

A network model was created in the carrier and the non-carrier groups of effect alleles so as to visualize the relationships between clinical/biochemical indicators and paired metabolites of each SNP based on partial correlation. To reflect the difference in the quantitative abundance between the pancreatic cancer incidence and control groups, we calculated the z-score of each variable.

Results

Anthropometric and clinical/biochemical characteristics at the baseline

Excluding 35 subjects without genotyping data, 349 patients were included in the final analysis [pancreatic cancer incidence group (n = 113) vs. control (n = 236)]. The baseline characteristics of the total subjects are presented in Table 1. No significant differences were noted between the pancreatic cancer incidence and control groups. To summarize, the mean age was 52.4 years in the pancreatic cancer incidence group and 52.7 years in the control group (p from t-test = 0.991). The pancreatic cancer incidence group included 77.0% male and 23.0% female, while the control group included 73.7% male and 26.3% female, indicating no significant difference between the groups (p = 0.511). No statistical difference was noted in BMI, with the pancreatic cancer incidence and control groups showing respective mean values of 24.6 and 24.3 (p = 0.238). In addition, the two groups showed no significant difference in CA 19–9 (pancreatic cancer incidence group, 20.0 ± 2.48; control group, 8.37 ± 0.526; p = 0.346). The Chi-squared test confirmed the lack of any significant difference in the frequency of current smokers between the two groups (pancreatic cancer incidence group, 31.7%; control group, 30.3%; p = 0.116).

Table 1.

Baseline clinical and biochemical characteristics of subjects

	Total (n = 349)		p
	Control (n = 236)	Pancreatic cancer incidence (n = 113)	p
Age (year)	52.4 ± 0.588	52.7 ± 0.832	0.991
Male/female n, (%)	174 (73.7)/62 (26.3)	87 (77.0)/26 (23.0)	0.511
Current smoker n, (%)	70 (30.3)	33 (31.7)	0.116
Body mass index (kg/m²)^†	24.3 ± 0.187	24.6 ± 0.256	0.238
Systolic blood pressure (mmHg)^†	120.8 ± 0.940	121.2 ± 1.24	0.929
Diastolic blood pressure (mmHg)^∮	75.0 ± 0.701	76.7 ± 0.833	0.571
Glucose (mg/dL)^†	95.6 ± 1.37	101.4 ± 2.90	0.439
White blood cell (10³/μL)^†	5.99 ± 0.109	8.26 ± 2.14	0.536
Albumin (g/dL)^†	4.53 ± 0.017	4.51 ± 0.027	0.199
Total cholesterol (mg/dL)^∮	193.6 ± 2.28	192.0 ± 3.27	0.669
Triglyceride (mg/dL)^∮	151.5 ± 6.79	141.1 ± 7.04	0.686
HDL-cholesterol (mg/dL)^†	50.3 ± 0.717	49.4 ± 1.01	0.286
LDL-cholesterol (mg/dL)^†	115.8 ± 2.08	117.0 ± 2.95	0.789
AST (IU/L)^†	26.0 ± 1.48	24.9 ± 0.847	0.986
ALT (IU/L)^†	27.4 ± 2.18	27.2 ± 1.50	0.353
GGT (IU/L)^†	41.8 ± 3.44	41.8 ± 4.20	0.574
ALP (IU/L)^†	121.5 ± 4.49	131.4 ± 6.84	0.223
Bilirubin (mg/dL)^†	0.881 ± 0.023	0.894 ± 0.040	0.710
Uric acid (mg/dL)	5.56 ± 0.092	5.39 ± 0.123	0.282
Blood urea nitrogen (mg/dL)^†	14.6 ± 0.237	14.6 ± 0.337	0.722
Creatinine (mg/dL)^†	1.00 ± 0.013	0.998 ± 0.017	0.164
CA 19–9 (U/mL)^†	8.37 ± 0.526	20.0 ± 2.48	0.346

Open in a new tab

Mean ± standard error (SE). Comparisons were conducted between the two groups (control vs. pancreatic) cancer incidence). Continuous variables were tested by an independent t-test, and variables marked with ∮ were tested by logarithmic transformation. Continuous variables with a nonnormal distribution, even after logarithmic transformation, were tested by a Mann–Whitney U test, and p-values are marked with †. Smoking status was tested by a Chi-squared test

AST aspartate aminotransferase, ALT alanine aminotransferase, GGT γ-glutamyltransferase, ALP alkaline phosphatase, HDL high-density lipoprotein, LDL low-density lipoprotein

Discriminant metabolites between the pancreatic cancer incidence and control groups

Among the 3165 detected features from MS, 173 metabolites were identified. A heatmap comparing the abundance of identified metabolites between the pancreatic cancer incidence and control groups is shown in Additional file 1: Figure S1.

Before establishing the XGBoost model, a random seed 6:4 was applied to divide the training and the test sets (Additional file 2: Data S2). In the training set, 68 individuals from the pancreatic cancer incidence group and 141 from the control group were included. There was no significant difference in the age and sex distribution between these two groups. The proportion of current smokers in the pancreatic cancer incidence group was 30.9%, which showed a statistical difference from the control group of 30.5% (p = 0.018). In the test set, 45 individuals were from the pancreatic cancer incidence group, while 95 were from the control group. There were no significant differences in terms of age, gender, or smoking status between these two groups.

We fitted XGBoost on the training dataset (n = 209) and calculated the feature importance for identifying the effect of metabolites on the fitted model. As a result, 11 metabolites that considerably differed between the groups were selected (feature importance ≥ 4.0), as summarized in Table 2. The levels of serum eicosa-11,14,17-trienoic acid, kynurenic acid, γ-glutamyl tyrosine, lysoPE(18:0/0:0), trans-3'-hydroxy cotinine, and L-leucine were found to be elevated in the pancreatic cancer incidence group. In contrast, the pancreatic cancer incidence group had lower N(6)-methyllysine, palmitic amide, adipic acid, 9-decenoylcarnitine, and 5α-pregnane-3,20-dione levels than the control group.

Table 2.

Identification of meaningful metabolites using XGBoost

Putative identification	HMBD ID	m/z	Formula	Feature importance	Pancreatic cancer incidence/control
Eicosa-11,14,17-trienoic acid	HMDB0244373	306.2560	C₂₀H₃₄O₂	6.0	1.826
Kynurenic acid	HMDB0000715	189.0429	C₁₀H₇NO₃	6.0	1.069
γ-Glutamyl tyrosine	HMDB0011741	310.1166	C₁₄H₁₈N₂O₆	5.0	1.230
N(6)-Methyllysine	HMDB0002038	160.1214	C₇H₁₆N₂O₂	5.0	0.875
LysoPE(18:0/0:0)	HMDB0011130	481.3170	C₂₃H₄₈NO₇P	5.0	1.040
Trans-3'-hydroxy cotinine	HMDB0304504	192.0901	C₁₀H₁₂N₂O₂	4.0	1.130
Palmitic amide	HMDB0012273	255.2563	C₁₆H₃₃NO	4.0	0.915
L-Leucine	HMDB0000687	131.0949	C₆H₁₃O₂	4.0	1.144
Adipic acid	HMDB0000448	146.0581	C₆H₁₀O₄	4.0	0.795
9-Decenoylcarnitine	HMDB0013205	313.2254	C₁₇H₃₁NO₄	4.0	0.794
5α-Pregnane-3,20-dione	HMDB0003759	316.2398	C₂₁H₃₂O₂	4.0	0.845

Open in a new tab

Feature Importance values > 4.0 are listed in Table 2. Feature Importance value was obtained from the XGBoost model of the training set (n = 209) [accuracy, 0.952; precision, 0.985; AUC 0.998], selecting discriminant metabolites related to pancreatic cancer incidence. The pancreatic cancer incidence/Control value was calculated using the relative abundance of each metabolite

The performance values of the XGBoost model on the training and test sets are shown in Additional file 2: Data S2. The training set had an accuracy of 0.952, precision of 0.983, recall of 0.868, and AUC of 0.998. In the case of the test set, an accuracy of 0.671, precision of 0.471, recall of 0.178, and AUC of 0.640 were recorded.

Metabolite-genomewide association analysis

Using 11 selected metabolites, we conducted a metabolite-GWAS. We generated a Manhattan plot to identify significant SNPs and performed linkage disequilibrium clumping with a threshold of p ≤ 5 × 10^–6 to mitigate the tendency for correlation between genetic variants located nearby. Logistic regression analysis was performed to demonstrate their association with the incidence of pancreatic cancer (Table 3). Particularly, the G allele of rs2370981 mapped to NRXN3, strongly related to eicosa-11,14,17-trienoic acid, was identified as a protective allele for pancreatic cancer [OR = 0.371, p = 0.043]. Other four notable SNPs (i.e., rs59519100, rs11164375, rs72805402, and rs55870181) were all associated with a higher risk of pancreatic cancer; rs59519100 showed a significant association with γ-glutamyl tyrosine, rs11164375 with lysoPE (18:0/0:0), rs72805402 (mapped to ZNF503) and rs55870181 with L-leucine; Manhattan plots for these are presented in Additional file 1: Figure S2.

Table 3.

Genome-wide association analysis of pancreatic cancer-related metabolites

Metabolites	SNP	Position	EA	EAF	Mapped Gene	OR [CI]	p
HMDB0244373 Eicosa-11,14,17-trienoic acid	rs6731366	chr2:132891267 (GRCh38.p14)	A	0.053	NCKAP5	1.740 [0.833–3.633]	0.141
	rs11860247	chr16:16071432 (GRCh38.p14)	A	0.086	ABCC1	1.016 [0.534–1.934]	0.960
	rs201237448	chr3:157725970 (GRCh38.p14)	A	0.053		0.869 [0.388–1.944]	0.732
	rs4541064	chr16:85230276 (GRCh38.p14)	C	0.487	GSE1	1.037 [0.615–1.748]	0.892
	rs114089627	chr3:3658708 (GRCh38.p14)	A	0.016		1.106 [0.503–2.432]	0.802
	rs138810234	chr4:161529556 (GRCh38.p14)	C	0.036	FSTL5	1.430 [0.834–2.450]	0.193
	rs77806269	chr3:12881670 (GRCh38.p14)	T	0.023	LINC02022, LOC105376956	0.749 [0.363–1.548]	0.436
	rs2370981	chr14:79402892 (GRCh38.p14)	G	0.017	NRXN3	0.371 [0.142–0.968]	0.043
	rs117753991	chr16:77579192 (GRCh38.p14)	G	0.024		0.951 [0.518–1.748]	0.873
	rs201592606	chr4:139691697 (GRCh38.p14)	G	0.050	MGST2	1.089 [0.683–1.738]	0.720
HMDB0000715 Kynurenic acid	rs73448444	chr13:28731007 (GRCh38.p14)	G	0.138		1.075 [0.614–1.883]	0.799
	rs200475458	chr17:16403690–703 (GRCh38.p14)	C	0.110		0.833 [0.463–1.498]	0.541
	rs187490	chr5:35045022 (GRCh38.p14)	G	0.311	AGXT2	1.256 [0.787–2.005]	0.339
	rs604140	chr2:67422625 (GRCh38.p14)	C	0.050		1.734 [0.834–3.602]	0.140
	rs78053646	chr12:231126 (GRCh38.p14)	T	0.166	SLC6A13, PARM1	1.233 [0.746–2.039]	0.414
	rs890289	chr4:75,047,822 (GRCh38.p14)	A	0.054	LOC107986289	0.998 [0.481–2.070]	0.995
	rs9908634	chr17:79431371 (GRCh38.p14)	T	0.062	RBFOX3	1.224 [0.603–2.485]	0.576
	rs12909308	chr15:61777037 (GRCh38.p14)	T	0.284		1.223 [0.770–1.942]	0.393
	rs10407389	chr19:36484416 (GRCh38.p14)	G	0.103	ZNF566	0.996 [0.559–1.775]	0.990
	rs72747726	chr15:69,899,377 (GRCh38.p14)	G	0.238		1.093 [0.680–1.757]	0.712
HMDB0011741 γ-Glutamyl tyrosine	rs59519100	chr20:33868761 (GRCh38.p14)	T	0.166		1.701 [1.046–2.765]	0.032
	rs193488	chr5:136703440 (GRCh38.p14)	G	0.053		1.920 [0.948–3.890]	0.070
	rs202074299	chr13:90,574,203 (GRCh38.p14)	T	0.165		1.126 [0.689–1.839]	0.637
HMDB0002038 N(6)-Methyllysine	rs200559669	chr6:123584482 (GRCh38.p14)	C	0.481	TRDN	0.685 [0.415–1.131]	0.139
	rs918171	chr19:3336541 (GRCh38.p14)	C	0.370		0.947 [0.599–1.499]	0.817
	rs2374205	chr4:113984965 (GRCh38.p14)	G	0.143	LOC124900762	0.893 [0.523–1.524]	0.677
	rs116931887	chr6:143808387 (GRCh38.p14)	C	0.384	PHACTR2	0.971 [0.612–1.241]	0.900
HMDB0011130 LysoPE(18:0/0:0)	rs11083238	chr18:27943526 (GRCh38.p14)	T	0.135	CDH2	1.299 [0.744–2.267]	0.358
	rs6731366	chr2:132891267 (GRCh38.p14)	A	0.053	NCKAP5	1.329 [0.615–2.874]	0.469
	rs12059514	chr1:102370790 (GRCh38.p14)	C	0.181		0.987 [0.571–1.706]	0.964
	rs28565987	chr15:88048299 (GRCh38.p14)	A	0.097	NTRK3	0.829 [0.432–1.590]	0.573
	rs2505110	chr10:30181971 (GRCh38.p14)	G	0.248		0.843 [0.514–1.384]	0.500
	rs62525721	chr8:129066916 (GRCh38.p14)	T	0.052		0.986 [0.430–2.259]	0.973
	rs8052560	chr16:88710834 (GRCh38.p14)	C	0.082	CTU2	1.391 [0.734–2.636]	0.312
	rs117723718	chr4:40487925 (GRCh38.p14)	G	0.057	RBM47	1.050 [0.489–2.254]	0.901
	rs28705703	chr6:167502412 (GRCh38.p14)	G	0.066		1.650 [0.823–3.306]	0.158
	rs9829051	chr3:31491669 (GRCh38.p14)	G	0.122		1.149 [0.628–2.102]	0.652
	rs375927045	chr7:16220625 (GRCh38.p14)	C	0.208	CRPPA	0.624 [0.370–1.052]	0.077
	rs881433	chr18:44864508 (GRCh38.p14)	A	0.262	SETBP1	1.293 [0.799–2.092]	0.295
	rs11164375	chr1:102083600 (GRCh38.p14)	T	0.080		2.194 [1.095–4.394]	0.027
	rs117753153	chr3:24689035 (GRCh38.p14)	G	0.050		1.041 [0.446–2.427]	0.927
	rs1923773	chr13:53176219 (GRCh38.p14)	G	0.148		0.999 [0.576–1.732]	0.996
	rs141483946	chr11:13570434 (GRCh38.p14)	A	0.079		1.472 [0.751–2.884]	0.260
	rs9345335	chr6:93188629 (GRCh38.p14)	G	0.400		0.999 [0.605–1.651]	0.998
	rs78505433	chr15:49737092 (GRCh38.p14)	T	0.053		1.642 [0.761–3.547]	0.207
	rs57966757	chr18:5919654 (GRCh38.p14)	A	0.076	LOC121725015	0.938 [0.477–1.844]	0.853
	rs4727289	chr7:93397016 (GRCh38.p14)	G	0.120		0.743 [0.407–1.356]	0.333
	rs55721115	chr14:34277292 (GRCh38.p14)	G	0.054		0.515 [0.212–1.250]	0.143
HMDB0304504 Trans-3′-hydroxy cotinine	rs74600139	chr5:44446171 (GRCh38.p14)	C	0.249		0.845 [0.531–1.344]	0.477
	rs8100204	chr19:19282905 (GRCh38.p14)	A	0.201	SUGP1	0.763 [0.472–1.234]	0.270
	rs148195640	chr20:59240551 (GRCh38.p14)	T	0.053	ZNF831	0.754 [0.435–1.306]	0.313
	rs4308248	chr3:134302128 (GRCh38.p14)	G	0.129		0.729 [0.327–1.621]	0.438
HMDB0012273 Palmitic amide	rs13043798	chr20:23283897 (GRCh38.p14)	A	0.269		0.709 [0.446–1.127]	0.146
	rs13132855	chr4:44399250 (GRCh38.p14)	A	0.085	KCTD8	1.038 [0.551–1.954]	0.908
	rs7949816	chr11:60278427 (GRCh38.p14)	A	0.130		0.778 [0.453–1.338]	0.364
	rs2724067	chr7:93831676 (GRCh38.p14)	A	0.246		0.877 [0.552–1.393]	0.579
	rs149210546	chr4:8824426 (GRCh38.p14)	C	0.391		1.458 [0.896–2.375]	0.129
	rs76582834	chr4:164383443 (GRCh38.p14)	C	0.076	MARCHF1	0.568 [0.280–1.152]	0.117
HMDB0000687 l-Leucine	rs76417681	chr2:61231241 (GRCh38.p14)	C	0.062	USP34	0.610 [0.271–1.371]	0.232
	rs17684350	chr10:18374682 (GRCh38.p14)	C	0.080	CACNB2	1.597 [0.814–3.133]	0.173
	rs55828915	chr1:207785790 (GRCh38.p14)	T	0.060	CD46	0.742 [0.335–1.642]	0.462
	rs57192942	chr10:127846701 (GRCh38.p14)	T	0.265		1.415 [0.863–2.320]	0.169
	rs72709073	chr9:69206346 (GRCh38.p14)	C	0.054	TJP2	0.749 [0.331–1.696]	0.488
	rs7182182	chr15:54,330,440 (GRCh38.p14)	A	0.064	UNC13C	0.848 [0.395–1.822]	0.673
	rs11525305	chr10:6632812 (GRCh38.p14)	A	0.073	LINC02648	1.365 [0.688–2.708]	0.373
	rs74724211	chr19:44209431 (GRCh38.p14)	G	0.097	ZNF227	0.847 [0.447–1.605]	0.610
	rs79500165	chr2:141763014 (GRCh38.p14)	T	0.056	LRP1B, LOC107985779	0.663 [0.271–1.626]	0.370
	rs72805402	chr10:75306714 (GRCh38.p14)	A	0.152	ZNF503	2.150 [1.258–3.674]	0.005
	rs687168	chr17:14441416 (GRCh38.p14)	C	0.172		1.281 [0.657–2.495]	0.467
	rs13388819	chr2:64599202 (GRCh38.p14)	T	0.099	LOC105374773	1.278 [0.681–2.398]	0.445
	rs77464636	chr7:2301325 (GRCh38.p14)	A	0.103	SNX8	0.85 [0.459–1.574]	0.605
	rs7525555	chr1:202170575 (GRCh38.p14)	G	0.138	PTPRVP	0.747 [0.420–1.329]	0.321
	rs7175639	chr15:50210373 (GRCh38.p14)	C	0.206	SLC27A2	1.005 [0.609–1.660]	0.983
	rs117920703	chr9:34250372 (GRCh38.p14)	A	0.060	UBAP1	0.449 [0.185–1.093]	0.078
	rs28438600	chr8:15523840 (GRCh38.p14)	A	0.050		0.598 [0.252–1.418]	0.243
	rs17134252	chr11:99717399 (GRCh38.p14)	A	0.059	CNTN5	0.978 [0.448–2.133]	0.955
	rs55870181	chr14:84615718 (GRCh38.p14)	T	0.272		1.821 [1.123–2.951]	0.015
	rs147699000	chr22:44353381 (GRCh38.p14)	A	0.070		0.901 [0.428–1.899]	0.785
	rs8074518	chr17:14443114 (GRCh38.p14)	G	0.338		0.970 [0.526–1.792]	0.924
	rs73497629	chr9:100688200 (GRCh38.p14)	C	0.242		1.258 [0.768–2.060]	0.362
	rs12429312	chr13:22915866 (GRCh38.p14)	A	0.212	LINC00621	0.908 [0.539–1.530]	0.717
	rs149903005	chr13:66587358–64 (GRCh38.p14)	C	0.126	PCDH9, LOC105370247	1.133 [0.639–2.011]	0.669
HMDB0000448 Adipic acid	rs6057003	chr20:9908557 (GRCh38.p14)	C	0.179		0.853 [0.520–1.400]	0.529
	rs10846689	chr12:124601819 (GRCh38.p14)	T	0.198		1.056 [0.610–1.826]	0.847
	rs73608605	chr8:39240193 (GRCh38.p14)	G	0.086	ADAM32	1.357 [0.730–2.521]	0.334
	rs12361624	chr11:30658520 (GRCh38.p14)	G	0.153		1.422 [0.859–2.354]	0.171
	rs837465	chr12:124534405 (GRCh38.p14)	A	0.188	NCOR2	0.943 [0.539–1.647]	0.835
	rs6964529	chr7:54293427 (GRCh38.p14)	C	0.085		1.064 [0.567–1.993]	0.848
	rs6739384	chr2:56142078 (GRCh38.p14)	A	0.064	LOC105374690	0.527 [0.248–1.124]	0.097
	rs9291437	chr4:22163632 (GRCh38.p14)	C	0.426		1.163 [0.710–1.907]	0.548
HMDB0013205 9-Decenoylcarnitine	rs117445640	chr4:179766591 (GRCh38.p14)	T	0.116		0.963 [0.556–1.669]	0.894
	rs17116178	chr11:113956604 (GRCh38.p14)	T	0.107		0.787 [0.442–1.399]	0.414
	rs2836817	chr21:39001801 (GRCh38.p14)	C	0.193	LINC02940	0.639 [0.390–1.049]	0.077
	rs71364155	chr17:12218522 (GRCh38.p14)	T	0.380		1.101 [0.688–1.762]	0.688
	rs1532216	chr12:99210187 (GRCh38.p14)	A	0.063	ANKS1B	1.186 [0.605–2.325]	0.619
HMDB0003759 5a-Pregnane-3,20-dione	rs74869776	chr12:30963077 (GRCh38.p14)	G	0.050	TSPAN11	0.716 [0.319–1.607]	0.418
	rs79255083	chr4:66526962 (GRCh38.p14)	C	0.069		1.104 [0.561–2.173]	0.775
	rs7760758	chr6:24036861 (GRCh38.p14)	A	0.062		0.636 [0.295–1.369]	0.247
	rs9792660	chr9:29371008 (GRCh38.p14)	T	0.205		0.802 [0.497–1.295]	0.367

Open in a new tab

Genome-wide association analysis of significant pancreatic cancer-related metabolites from XGBoost was performed. Significant associations (p ≤ 5 × 10^–6) were presented. Exp(B) and p were derived from logistic regression of pancreatic cancer with adjusting sex. Values reported in bold are statistically significant in logistic regression evaluating association between pancreatic cancer and EA of SNP (p < 0.05)

EA: effect allele, EAF effect allele frequency

Network analysis between metabolomic biomarkers and clinical/biochemical indicators

We divided the subjects into each SNP’s effect allele carrier and non-carrier groups. Then, clinical/biochemical indicators and pair metabolites of the SNP were used to create network models based on the z-score obtained after comparing the pancreatic cancer incidence and control groups for each variable and the partial correlation values between them (Fig. 1).

Fig. 1 — The network between metabolites and clinical/biochemical indicators in each SNP group. *ALB* albumin, *ALP* alkaline phosphatase, *ALT* alanine aminotransferase, *AST* aspartate aminotransferase, *BIL* bilirubin, *BMI* body mass index, *BUN* blood urea nitrogen, *CHO* total cholesterol, *CRE* creatinine, *DBP* diastolic blood pressure, *FBS* fasting blood sugar, *GGT* gamma-glutamyltransferase, *HDL*, high-density lipoprotein, *LDL* low-density lipoprotein, *SBP* Systolic blood pressure, TG Triglyceride, *URIC* uric acid, *WBC* white blood cell. Node presents metabolite or clinical/biochemical indicators; the edge between two nodes indicates a partial correlation. The color of the nodes represents the z-score when comparing the pancreatic cancer incidence and control groups. Positive and negative correlations are represented using light-red and light-blue edges. Thicker edges represent stronger correlations between the two metabolite levels

As a result, pair metabolites of rs2370981, rs55870181, rs59519100, and rs72805402 displayed significantly different partial correlation network patterns with the clinical/biochemical indicators on comparison of the effect allele carrier and the non-carrier groups of each SNP. In summary, the risk allele carriers of rs2370981 showed several significant partial correlations that were not detected in the non-risk allele carriers; eicosa-11,14,17-trienoic acid with low-density lipoprotein (LDL) (r = 0.613, p = 0.045), alanine aminotransferase (ALT) (r = 0.632, p = 0.037), white blood cell (r = 0.816, p = 0.002), body mass index (r = -0.636, p = 0.036), and creatinine (r = − 0.67, p = 0.024). Moreover, a significant negative partial correlation between γ-glutamyl tyrosine and aspartate aminotransferase (AST) (r = − 0.237, p = 0.049) was observed in the risk allele carriers of rs59519100. Finally, l-leucine exhibited notable partial correlations with a few clinical/biochemical indications. l-Leucine and diastolic blood pressure (r = 0.18, p = 0.046) and L-leucine and glucose (r = − 0.259, p = 0.004) were identified as the risk allele carriers of rs55870181. In addition, in the non-risk allele carriers of rs72805402, l-leucine positively correlated with the blood urea nitrogen level (r = 0.137, p = 0.049) and negatively correlated with high-density lipoprotein (r = − 0.146, p = 0.035).

Mediation and moderation analyses

Mediation analysis, after adjusting for age and sex, was conducted on the selected metabolites and SNP biomarkers for pancreatic cancer. We noted significant outcomes in the association between γ-glutamyl tyrosine and rs59519100. Although rs59519100 showed no significant direct effect on pancreatic cancer incidence (β = 0.069, p = 0.242), γ-glutamyl tyrosine mediated the indirect effect of rs59519100 on pancreatic cancer incidence (β = 0.056, p = 0.002) with causal mediation effects of 44.6% relative to the total effect (Fig. 2).

Fig. 2 — Mediation and moderation analysis. The result of the mediation analysis is presented in the blue circle and that of the moderation analysis in the red circle. Adjusting odds ratio (AOR) and confidence interval are indicated with points and lines on the graph. Variables marked with a are derived from the age- and sex-adjusting model. Variable marked with b is derived from the age-, sex-, and smoking status-adjusting model

Next, we conducted a moderation analysis after adjusting for the age and sex so as to explore the effect of smoking status as a moderator on the association among γ-glutamyl tyrosine, rs59519100, and pancreatic cancer (Fig. 2). The level of γ-glutamyl tyrosine was negatively associated with pancreatic cancer risk (β = -0.504, p < 0.001). It was maintained after adjusting the smoking status (β = − 0.508, p < 0.001). When the interaction effect (smoking status * γ-glutamyl tyrosine) was added to the linear model, this interaction term was found to be positively associated with pancreatic cancer risk (β = 0.666, p = 0.033). In other words, the smoking status affected the association between γ-glutamyl tyrosine and pancreatic cancer risk. In addition, smoking did not significantly modulate the other associations (Additional file 1: Figure S3).

Evaluation of the predictive power as a biomarker for pancreatic cancer

Figure 3 depicts the prediction model using conventional risk factors and significant biomarkers identified in the present research. First, the total subjects' results (n = 349) are as follows: an area under the curve (AUC) obtained from the prediction model consisting of age, sex, and CA 19–9 was 0.569 [0.484–0.654]. The conventional model with age, sex, smoking status (never, ever, current), and CA 19–9 was 0.564 [0.480–0.649]. On adding five SNP biomarkers (i.e., rs2370981, rs59519100, rs11164375, rs72805402, and rs55870181) and four metabolic biomarkers (i.e., eicosa-11,14,17-trienoic acid, γ-glutamyl tyrosine, lysoPE(18:0/0:0), and L-leucine) to the conventional model, AUC was improved to 0.702 [0.640–0.763]. The highest AUC of 0.738 [0.661–0.815] was observed in the final model consisting of all variables (i.e., age, sex, smoking status, CA 19–9, rs2370981, rs59519100, rs11164375, rs72805402, rs55870181, eicosa-11,14,17-trienoic acid, γ-glutamyl tyrosine, lysoPE(18:0/0:0), and l-leucine). Furthermore, the predictive power of the model using variables indicating significance in mediation and moderation analyses (i.e., age, sex, smoking status, γ-glutamyl tyrosine, and rs59519100) was an AUC of 0.651 [0.588–0.713], which was within the range of predictive power of the previously described models.

Fig. 3 — ROC curves for the prediction of pancreatic cancer in total subjects. Prediction models in the total subjects (n = 349), training set (n = 209), and test set (n = 140). The variables utilized in each model are different, and each model is displayed in a different color

The prediction performance trend was similar even when analyzed separately into training (n = 209) and test sets (n = 140). In both sets, the final model when metabolic and SNP biomarkers were added to the conventional model exhibited the most potent prediction power, and the predictive power of the final model was considerably improved when compared to the conventional model. The final model of the training set had an AUC of 0.843 [0.769–0.918], whereas the conventional model was 0.625 [0.526–0.725]. In addition, the final model of the test set had an AUC of 0.734 [0.618–0.850], while the conventional model showed 0.568 [0.416–0.719].

Discussion

We discovered four metabolites (i.e., eicosa-11,14,17-trienoic acid, γ-glutamyl tyrosine, lysoPE(18:0), and L-leucine) and five SNPs (i.e., rs2370981, rs59519100, rs11164375, rs72805402, and rs55870181) with the potential to act as predictive biomarkers for pancreatic cancer using metabolite-GWAS analysis. As the current study used data from subjects obtained before the onset of pancreatic cancer, no significant difference was noted between the two groups in terms of CA 19–9, which was mainly used to determine the prognosis, treatment effects, and recurrence of pancreatic cancer. Moreover, the predictive value of the conventional model for predicting pancreatic cancer using age, gender, smoking status, and CA 19–9 was 0.564 [0.480–0.649]. However, when the four metabolites and five SNPs identified in this study were combined, the predictive power for pancreatic cancer increased to 0.702 [0.640–0.763], and, when CA 19–9 was integrated, the predictive power for pancreatic cancer was found to be the highest, with an AUC of 0.738 [0.661–0.815]. In other words, utilizing CA 19–9, not extensively used in the screening tests owing to its low specificity, with the biomarkers revealed in our study could improve the predictive potential for the early detection of pancreatic cancer risk. Furthermore, the partial correlation network between each pair of metabolites and clinical/biochemical indicators revealed significantly different patterns according to the effect allele carrier or non-carrier groups of rs2370981, rs55870181, rs59519100, and rs72805402; metabolism involving metabolic biomarkers were associated with a genetic predisposition.

Among them, the indirect effect of rs59519100 mediated by γ-glutamyl tyrosine on pancreatic cancer risk was demonstrated through mediation analysis. Furthermore, the association between γ-glutamyl tyrosine and pancreatic cancer risk was impacted by the smoking status. γ-Glutamyl tyrosine is a dipeptide composed of γ-glutamate and tyrosine—a product of incomplete proteolytic breakdown. Although dipeptides have some physiological effects, the metabolic function of γ-glutamyl tyrosine is unclear. We observed a higher serum level of γ-glutamyl tyrosine in the pancreatic cancer incidence group. The abnormal levels of γ-glutamyl dipeptide have been linked to several metabolic disorders in epidemiological studies [13, 14]. Similarly, metabolomics discovered several γ-glutamyl dipeptides related to oxidative stress and dysregulated lipid profiles [15, 16] as they are involved in the γ-glutamyl cycle for regenerating the intracellular glutathione. As γ-glutamyltransferase (GGT) detoxicates glutathione, increased GGT activity is an important marker for increased oxidative stress. γ-Glutamyl tyrosine, observed in our study, may also contribute to the biochemical pathways, inducing oxidative stress.

Unexpectedly, γ-glutamyl tyrosine was not significantly correlated with the levels of GGT, ALT, and AST in all the subjects of the present study (data not shown). However, a negative partial correlation between γ-glutamyl tyrosine and AST (r = − 0.237, p = 0.049) was identified in the risk allele carrier of the rs59519100 group. In other words, subjects with the rs59519100 risk allele showed a high risk of developing pancreatic cancer, and metabolic alterations in their etiology were implied by AST and γ-glutamyl tyrosine. As liver enzymes (i.e., GGT, ALT, and AST) are very close to each other, the significance of AST could be connected with the mechanisms of γ-glutamyl tyrosine linked to GGT. Indeed, pancreatic ductal adenocarcinoma patients with elevated AST levels revealed a considerably shorter overall survival than those with lower AST levels [17]. Furthermore, we discovered a novel SNP, rs59519100, significantly associated with γ-glutamyl tyrosine, in relation to the risk of pancreatic cancer. Further study is therefore needed to clarify the underlying mechanisms of these valuable biomarkers.

Intriguingly, through moderation analysis, we demonstrated that the smoking status significantly affected the association between γ-glutamyl tyrosine and pancreatic cancer risk. On the other hand, an association between the smoking status and γ-glutamyl tyrosine has not yet been reported, while liver enzymes (such as GGT, AST, and ALT), which is possibly connected to γ-glutamyl tyrosine, has shown some evidence of association with the smoking habit. Zhang et al. [18] determined the smoking and alcohol drinking habit synergistically affected the elevation of GGT levels in Chinese [19, 20]. In a mouse model, the maternal smoking exposure during pregnancy increased the severity of non-alcoholic steatohepatitis in offspring mice by increasing their serum ALT, AST, total cholesterol, and triglyceride levels and modulating the phosphorylation of AMP-activated protein kinase [21]. Elucidation of the exact metabolic pathways between these biomarkers through which the smoking modulates can facilitate precision medicine or management for pancreatic cancer.

The next notable biomarker is l-leucine, which belongs to the branched-chain amino acids (BCAAs). The breakdown of BCAAs, mainly stored as tissue protein, provides a source for synthesizing other molecules. Consistent with some previous reports, serum l-leucine was elevated in the prediagnostic serum of the pancreatic cancer-incidence group when compared to the control in our research. Mayers et al. observed that subjects with elevated circulating BCAAs in the prediagnostic plasma had more than a two-fold increased risk of pancreatic ductal adenocarcinoma (PDAC) [22]. The leading cause of this increase in plasma BCAAs is tissue protein degradation exceeding the systemic requirement for BCAAs [22, 23], which often occurs in metabolic diseases [24]. Moreover, abnormal physiological functions of the pancreas, including that related to insulin secretion, could directly modulate tissue protein degradation, including that of BCAAs. In all the study subjects, l-leucine was found to be negatively correlated with the levels of glucose (r = − 0.113, p = 0.034), LDL (r = − 0.130, p = 0.015), and uric acid (r = − 0.118, p = 0.031) (data not shown). These findings indicate that higher leucine levels in the pancreatic cancer incidence group may closely reflect the condition of the pancreas during disease progression.

Furthermore, one of the two SNPs associated with L-leucine was mapped to the gene; rs72805402 mapped to ZNF503 (Zinc Finger Protein 503) that functions as a transcriptional repressor. Rich leucine residues in the SCAN domain of zinc finger proteins participate in protein—protein interaction, thereby inducing various transcription activities [25]. The associations of ZNF503 acting as an essential regulator have been reported during the developmental process and tumor initiation with multiple carcinomas, [26, 27] but not in pancreatic cancer. Therefore, our data provide a candidate gene for diagnostic and therapeutic strategies for pancreatic cancer. Different network patterns in the risk allele carrier or non-carrier groups provide a comprehensive insight into SNP-metabolite-clinical indicators of pancreatic cancer incidence.

Finally, eicosa-11,14,17-trienoic acid associated with rs2370981 mapped to NRXN3 (neurexin 3) belongs to the long-chain fatty acids, with very few articles published on eicosa-11,14,17-trienoic acid [28]. NRXN3 encodes the receptor and cell adhesion molecules mainly involved in the nervous system [29]. Therefore, most mutations in this gene have been reported in neurological diseases, and several associations with carcinoma have been reported, albeit not in pancreatic cancer. Interestingly, hypermethylation of ZNF582, the same class as zinc finger protein associated with L-leucine in our research, regulated the transcription of NRXN3 in nasopharyngeal carcinoma [30]. In addition, the changes in the protein NRXN3 level in the brain cerebrospinal fluid derived from Huntington’s disease agreed with the protein and mRNA levels of ZNF503 [31]. Based on the recent literature review, we suggested that SNPs of the two genes discovered in our study could synergistically affect the pancreatic cancer risk.

Several limitations should be delineated in this case. First, this study was conducted on design without classifying the pancreatic cancer type. Therefore, if the result was replicated from blood samples collected following the pancreatic cancer stage with type information, the biomarkers identified in the present study could be robust for pancreatic cancer. Next, it was a small sample size for conducting GWAS. With a larger sample size, it was possible to discover more meaningful biomarkers, with more substantial statistical power. Third, drawing the causality and interpreting the underlying mechanisms between biomarkers were challenging in our study design. Instead, we performed moderation, mediation, and network analysis. Additional experimental research is therefore warranted to elucidate the exact mechanism of pathogenesis related to discovered associations. Furthermore, the effect of smoking was analyzed using only self-reported smoking status data. Thus, it is necessary to examine the impact of smoking on other variables, such as the duration and amount of tobacco use.

Despite some limitations in this study, it is the first one to employ metabolite-GWAS for pancreatic cancer in the Korean population. As a result, we identified four metabolites (i.e., eicosa-11,14,17-trienoic acid, γ-glutamyl tyrosine, lysoPE(18:0), and L-leucine) and five SNPs (i.e., rs2370981, rs59519100, rs11164375, rs72805402, and rs55870181) with the potential for use as predictive biomarkers for pancreatic cancer risk. Particularly, we noted the indirect effect of rs59519100 mediated by γ-glutamyl tyrosine on pancreatic cancer risk and affected by the smoking status. Indeed, the smoking status affected the newly discovered pathogenesis involving γ-glutamyl tyrosine related to pancreatic cancer risk. In addition, the difference in the network pattern based on the presence or absence of risk allele of SNP is also noteworthy. We therefore believe that the present results can serve as the base of precision medicine or management for pancreatic cancer.

Supplementary Information

12967_2023_4670_MOESM1_ESM.pdf^{(1.6MB, pdf)}

Additional file 1: Figure S1. Heatmap of metabolite abundance in each group. Figure S2. Manhattan plot from GWAS. Figure S3. Moderation effect of smoking on association between metabolite and pancreatic cancer risk.

12967_2023_4670_MOESM2_ESM.pdf^{(26.3KB, pdf)}

Additional file 2: Data S1. Python and R codes used in the current research. Data S2. Characteristics of the divided set from XGBoost.

Acknowledgements

Not applicable

Abbreviations

ALT: Alanine aminotransferase
AST: Aspartate aminotransferase
AUC: Area under the curve
BCAAs: Branched-chain amino acids
CA: Carbohydrate antigen
ESI: Electrospray ionization mode
GGT: Gamma-glutamyltransferase
ISTD: Internal standard
KCPS: Korean Cancer Prevention Study
KEGG: Kyoto encyclopedia of genes and genomes
LDL: Low-density lipoprotein
NRXN3: Neurexin 3
PDAC: Pancreatic ductal adenocarcinoma
QC: Quality control
SNP: Single nucleotide polymorphisms
XGBoost: EXtreme Gradient Boosting
ZNF503: Zinc finger protein 503

Author contributions

YH designed the study, conducted experimental analyses, performed statistical analyses, wrote the draft, and revised the manuscript. KJJ designed the study and interpreted the data. UK performed statistical analyses and interpreted the data. CIJ conducted experimental analyses and interpreted the data. KL interpreted the data. SHJ designed the study and provided samples. All authors carefully reviewed the final manuscript and approved it for publication.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT of the Korea government (MSIT) [NRF-2022R1A6A3A01085831].

Availability of data and materials

Some or all datasets generated during and/or analyzed during the current study are not publicly available, but can be made available from the corresponding author upon reasonable request.

Declarations

Ethical approval and consent to participate

All procedures in the studies involving human participants were performed in accordance with the ethical standards of the Institutional Review Board at the Yonsei University Health System under the Helsinki Declaration [IRB number: 4-2022-1136]. Paper-based informed consent forms, stored in a document system after obtaining the necessary signatures, were used to record the intent and to identify the will of the subjects to participate in the research.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflicts of interest.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Hassan MM, Bondy ML, Wolff RA, Abbruzzese JL, Vauthey JN, Pisters PW, et al. Risk factors for pancreatic cancer: case-control study. Am J Gastroenterol. 2007;102(12):2696–2707. doi: 10.1111/j.1572-0241.2007.01510.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J Clin. 2022;72(1):7–33. doi: 10.3322/caac.21708. [DOI] [PubMed] [Google Scholar]
3.Statistics Korea. Korean Statistical Information Service database: Cause of death statistics in 2021. 2022.
4.Yadav D, Lowenfels AB. The epidemiology of pancreatitis and pancreatic cancer. Gastroenterology. 2013;144(6):1252–1261. doi: 10.1053/j.gastro.2013.01.068. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Lynch SM, Vrieling A, Lubin JH, Kraft P, Mendelsohn JB, Hartge P, et al. Cigarette smoking and pancreatic cancer: a pooled analysis from the pancreatic cancer cohort consortium. Am J Epidemiol. 2009;170(4):403–413. doi: 10.1093/aje/kwp134. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Luo G, Jin K, Deng S, Cheng H, Fan Z, Gong Y, et al. Roles of CA19-9 in pancreatic cancer: biomarker, predictor and promoter. Biochim Biophys Acta Rev Cancer. 2021;1875(2):188409. doi: 10.1016/j.bbcan.2020.188409. [DOI] [PubMed] [Google Scholar]
7.Ge L, Pan B, Song F, Ma J, Zeraatkar D, Zhou J, et al. Comparing the diagnostic accuracy of five common tumour biomarkers and CA19-9 for pancreatic cancer: a protocol for a network meta-analysis of diagnostic test accuracy. BMJ Open. 2017;7(12):e018175. doi: 10.1136/bmjopen-2017-018175. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hwang JH, Voortman J, Giovannetti E, Steinberg SM, Leon LG, Kim YT, et al. Identification of microRNA-21 as a biomarker for chemoresistance and clinical outcome following adjuvant therapy in resectable pancreatic cancer. PLoS ONE. 2010;5(5):e10630. doi: 10.1371/journal.pone.0010630. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hasin Y, Seldin M, Lusis A. Multi-omics approaches to disease. Genome Biol. 2017;18(1):83. doi: 10.1186/s13059-017-1215-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Pettini F, Visibelli A, Cicaloni V, Iovinelli D, Spiga O. Multi-omics model applied to cancer genetics. Int J Mol Sci. 2021;22(11):5751. doi: 10.3390/ijms22115751. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Faca VM, Song KS, Wang H, Zhang Q, Krasnoselsky AL, Newcomb LF, et al. A mouse to human search for plasma proteome changes associated with pancreatic tumor development. PLoS Med. 2008;5(6):e123. doi: 10.1371/journal.pmed.0050123. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Xiao Y, Bi M, Guo H, Li M. Multi-omics approaches for biomarker discovery in early ovarian cancer diagnosis. EBioMedicine. 2022;79:104001. doi: 10.1016/j.ebiom.2022.104001. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Comte B, Monnerie S, Brandolini-Bunlon M, Canlet C, Castelli F, Chu-Van E, et al. Multiplatform metabolomics for an integrative exploration of metabolic syndrome in older men. EBioMedicine. 2021;69:103440. doi: 10.1016/j.ebiom.2021.103440. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Saoi M, Sasaki K, Sagawa H, Abe K, Kogiso T, Tokushige K, et al. High throughput screening of serum γ-Glutamyl dipeptides for risk assessment of nonalcoholic steatohepatitis with impaired glutathione salvage pathway. J Proteome Res. 2020;19(7):2689–2699. doi: 10.1021/acs.jproteome.9b00405. [DOI] [PubMed] [Google Scholar]
15.Zheng Y, Yu B, Alexander D, Steffen LM, Boerwinkle E. Human metabolome associates with dietary intake habits among African Americans in the atherosclerosis risk in communities study. Am J Epidemiol. 2014;179(12):1424–1433. doi: 10.1093/aje/kwu073. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Zierer J, Kastenmüller G, Suhre K, Gieger C, Codd V, Tsai PC, et al. African Americans in the atherosclerosis risk in communities study. Am J Epidemiol. 2014;179(12):1424–1433. doi: 10.1093/aje/kwu073. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.He M, Liu Y, Huang H, Wu J, Wu J, Wang R, et al. Serum aspartate aminotransferase is an adverse prognostic indicator for patients with resectable pancreatic ductal adenocarcinoma. Lab Med. 2023 doi: 10.1093/labmed/lmad014. [DOI] [PubMed] [Google Scholar]
18.Zhang Z, Ma L, Geng H, Bian Y. Effects of smoking, and drinking on serum gamma-glutamyl transferase levels using physical examination data: a cross-sectional study in Northwest China. Int J Gen Med. 2021;14:1301–1309. doi: 10.2147/IJGM.S301900. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Wannamethee SG, Shaper AG. Cigarette smoking and serum liver enzymes: the role of alcohol and inflammation. Ann Clin Biochem. 2010;47(Pt 4):321–326. doi: 10.1258/acb.2010.009303. [DOI] [PubMed] [Google Scholar]
20.Csordas A, Bernhard D. The biology behind the atherothrombotic effects of cigarette smoke. Nat Rev Cardiol. 2013;10(4):219–230. doi: 10.1038/nrcardio.2013.8. [DOI] [PubMed] [Google Scholar]
21.Yang D, Kim JW, Jeong H, Kim MS, Lim CW, Lee K, et al. Effects of maternal cigarette smoke exposure on the progression of nonalcoholic steatohepatitis in offspring mice. Toxicol Res. 2022;39(1):91–103. doi: 10.1007/s43188-022-00153-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Mayers JR, Wu C, Clish CB, Kraft P, Torrence ME, Fiske BP, et al. Elevation of circulating branched-chain amino acids is an early event in human pancreatic adenocarcinoma development. Nat Med. 2014;20(10):1193–1198. doi: 10.1038/nm.3686. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Ferguson D, Eichler SJ, Yiew NKH, Colca JR, Cho K, Patti GJ, et al. Mitochondrial pyruvate carrier inhibition initiates metabolic crosstalk to stimulate branched chain amino acid catabolism. Mol Metab. 2023;70:101694. doi: 10.1016/j.molmet.2023.101694. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Sivanand S, Vander Heiden MG. Emerging roles for branched-chain amino acid metabolism in cancer. Cancer Cell. 2020;37(2):147–156. doi: 10.1016/j.ccell.2019.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Li X, Han M, Zhang H, Liu F, Pan Y, Zhu J, et al. Structures and biological functions of zinc finger proteins and their roles in hepatocellular carcinoma. Biomark Res. 2022;10(1):2. doi: 10.1186/s40364-021-00345-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Yin G, Liu Z, Wang Y, Sun L, Wang L, Yao B, et al. ZNF503 accelerates aggressiveness of hepatocellular carcinoma cells by down-regulation of GATA3 expression and regulated by microRNA-495. Am J Transl Res. 2019;11(6):3426–3437. [PMC free article] [PubMed] [Google Scholar]
27.Shahi P, Wang CY, Lawson DA, Slorach EM, Lu A, Yu Y, et al. ZNF503/Zpo2 drives aggressive breast cancer progression by down-regulation of GATA3 expression. Proc Natl Acad Sci U S A. 2017;114(12):3169–3174. doi: 10.1073/pnas.1701690114. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.The Human Metabolome Database. https://hmdb.ca/metabolites/HMDB0244373
29.Kamal N, Jafari Khamirani H, Dara M, Dianatpour M. NRXN3 mutations cause developmental delay, movement disorder, and behavioral problems: CRISPR edited cells based WES results. Gene. 2023;867:147347. doi: 10.1016/j.gene.2023.147347. [DOI] [PubMed] [Google Scholar]
30.Zhao Y, Hong XH, Li K, Li YQ, Li YQ, He SW, et al. ZNF582 hypermethylation promotes metastasis of nasopharyngeal carcinoma by regulating the transcription of adhesion molecules Nectin-3 and NRXN3. Cancer Commun. 2020;40(12):721–737. doi: 10.1002/cac2.12104. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Fang Q, Strand A, Law W, Faca VM, Fitzgibbon MP, Hamel N, et al. Brain-specific proteins decline in the cerebrospinal fluid of humans with Huntington disease. Mol Cell Proteomics. 2009;8(3):451–466. doi: 10.1074/mcp.M800231-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12967_2023_4670_MOESM1_ESM.pdf^{(1.6MB, pdf)}

12967_2023_4670_MOESM2_ESM.pdf^{(26.3KB, pdf)}

Additional file 2: Data S1. Python and R codes used in the current research. Data S2. Characteristics of the divided set from XGBoost.

Data Availability Statement

Some or all datasets generated during and/or analyzed during the current study are not publicly available, but can be made available from the corresponding author upon reasonable request.

[CR1] 1.Hassan MM, Bondy ML, Wolff RA, Abbruzzese JL, Vauthey JN, Pisters PW, et al. Risk factors for pancreatic cancer: case-control study. Am J Gastroenterol. 2007;102(12):2696–2707. doi: 10.1111/j.1572-0241.2007.01510.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J Clin. 2022;72(1):7–33. doi: 10.3322/caac.21708. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Statistics Korea. Korean Statistical Information Service database: Cause of death statistics in 2021. 2022.

[CR4] 4.Yadav D, Lowenfels AB. The epidemiology of pancreatitis and pancreatic cancer. Gastroenterology. 2013;144(6):1252–1261. doi: 10.1053/j.gastro.2013.01.068. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Lynch SM, Vrieling A, Lubin JH, Kraft P, Mendelsohn JB, Hartge P, et al. Cigarette smoking and pancreatic cancer: a pooled analysis from the pancreatic cancer cohort consortium. Am J Epidemiol. 2009;170(4):403–413. doi: 10.1093/aje/kwp134. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Luo G, Jin K, Deng S, Cheng H, Fan Z, Gong Y, et al. Roles of CA19-9 in pancreatic cancer: biomarker, predictor and promoter. Biochim Biophys Acta Rev Cancer. 2021;1875(2):188409. doi: 10.1016/j.bbcan.2020.188409. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Ge L, Pan B, Song F, Ma J, Zeraatkar D, Zhou J, et al. Comparing the diagnostic accuracy of five common tumour biomarkers and CA19-9 for pancreatic cancer: a protocol for a network meta-analysis of diagnostic test accuracy. BMJ Open. 2017;7(12):e018175. doi: 10.1136/bmjopen-2017-018175. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Hwang JH, Voortman J, Giovannetti E, Steinberg SM, Leon LG, Kim YT, et al. Identification of microRNA-21 as a biomarker for chemoresistance and clinical outcome following adjuvant therapy in resectable pancreatic cancer. PLoS ONE. 2010;5(5):e10630. doi: 10.1371/journal.pone.0010630. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Hasin Y, Seldin M, Lusis A. Multi-omics approaches to disease. Genome Biol. 2017;18(1):83. doi: 10.1186/s13059-017-1215-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Pettini F, Visibelli A, Cicaloni V, Iovinelli D, Spiga O. Multi-omics model applied to cancer genetics. Int J Mol Sci. 2021;22(11):5751. doi: 10.3390/ijms22115751. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Faca VM, Song KS, Wang H, Zhang Q, Krasnoselsky AL, Newcomb LF, et al. A mouse to human search for plasma proteome changes associated with pancreatic tumor development. PLoS Med. 2008;5(6):e123. doi: 10.1371/journal.pmed.0050123. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Xiao Y, Bi M, Guo H, Li M. Multi-omics approaches for biomarker discovery in early ovarian cancer diagnosis. EBioMedicine. 2022;79:104001. doi: 10.1016/j.ebiom.2022.104001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Comte B, Monnerie S, Brandolini-Bunlon M, Canlet C, Castelli F, Chu-Van E, et al. Multiplatform metabolomics for an integrative exploration of metabolic syndrome in older men. EBioMedicine. 2021;69:103440. doi: 10.1016/j.ebiom.2021.103440. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Saoi M, Sasaki K, Sagawa H, Abe K, Kogiso T, Tokushige K, et al. High throughput screening of serum γ-Glutamyl dipeptides for risk assessment of nonalcoholic steatohepatitis with impaired glutathione salvage pathway. J Proteome Res. 2020;19(7):2689–2699. doi: 10.1021/acs.jproteome.9b00405. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Zheng Y, Yu B, Alexander D, Steffen LM, Boerwinkle E. Human metabolome associates with dietary intake habits among African Americans in the atherosclerosis risk in communities study. Am J Epidemiol. 2014;179(12):1424–1433. doi: 10.1093/aje/kwu073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Zierer J, Kastenmüller G, Suhre K, Gieger C, Codd V, Tsai PC, et al. African Americans in the atherosclerosis risk in communities study. Am J Epidemiol. 2014;179(12):1424–1433. doi: 10.1093/aje/kwu073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.He M, Liu Y, Huang H, Wu J, Wu J, Wang R, et al. Serum aspartate aminotransferase is an adverse prognostic indicator for patients with resectable pancreatic ductal adenocarcinoma. Lab Med. 2023 doi: 10.1093/labmed/lmad014. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Zhang Z, Ma L, Geng H, Bian Y. Effects of smoking, and drinking on serum gamma-glutamyl transferase levels using physical examination data: a cross-sectional study in Northwest China. Int J Gen Med. 2021;14:1301–1309. doi: 10.2147/IJGM.S301900. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Wannamethee SG, Shaper AG. Cigarette smoking and serum liver enzymes: the role of alcohol and inflammation. Ann Clin Biochem. 2010;47(Pt 4):321–326. doi: 10.1258/acb.2010.009303. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Csordas A, Bernhard D. The biology behind the atherothrombotic effects of cigarette smoke. Nat Rev Cardiol. 2013;10(4):219–230. doi: 10.1038/nrcardio.2013.8. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Yang D, Kim JW, Jeong H, Kim MS, Lim CW, Lee K, et al. Effects of maternal cigarette smoke exposure on the progression of nonalcoholic steatohepatitis in offspring mice. Toxicol Res. 2022;39(1):91–103. doi: 10.1007/s43188-022-00153-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Mayers JR, Wu C, Clish CB, Kraft P, Torrence ME, Fiske BP, et al. Elevation of circulating branched-chain amino acids is an early event in human pancreatic adenocarcinoma development. Nat Med. 2014;20(10):1193–1198. doi: 10.1038/nm.3686. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Ferguson D, Eichler SJ, Yiew NKH, Colca JR, Cho K, Patti GJ, et al. Mitochondrial pyruvate carrier inhibition initiates metabolic crosstalk to stimulate branched chain amino acid catabolism. Mol Metab. 2023;70:101694. doi: 10.1016/j.molmet.2023.101694. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Sivanand S, Vander Heiden MG. Emerging roles for branched-chain amino acid metabolism in cancer. Cancer Cell. 2020;37(2):147–156. doi: 10.1016/j.ccell.2019.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Li X, Han M, Zhang H, Liu F, Pan Y, Zhu J, et al. Structures and biological functions of zinc finger proteins and their roles in hepatocellular carcinoma. Biomark Res. 2022;10(1):2. doi: 10.1186/s40364-021-00345-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Yin G, Liu Z, Wang Y, Sun L, Wang L, Yao B, et al. ZNF503 accelerates aggressiveness of hepatocellular carcinoma cells by down-regulation of GATA3 expression and regulated by microRNA-495. Am J Transl Res. 2019;11(6):3426–3437. [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Shahi P, Wang CY, Lawson DA, Slorach EM, Lu A, Yu Y, et al. ZNF503/Zpo2 drives aggressive breast cancer progression by down-regulation of GATA3 expression. Proc Natl Acad Sci U S A. 2017;114(12):3169–3174. doi: 10.1073/pnas.1701690114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.The Human Metabolome Database. https://hmdb.ca/metabolites/HMDB0244373

[CR29] 29.Kamal N, Jafari Khamirani H, Dara M, Dianatpour M. NRXN3 mutations cause developmental delay, movement disorder, and behavioral problems: CRISPR edited cells based WES results. Gene. 2023;867:147347. doi: 10.1016/j.gene.2023.147347. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Zhao Y, Hong XH, Li K, Li YQ, Li YQ, He SW, et al. ZNF582 hypermethylation promotes metastasis of nasopharyngeal carcinoma by regulating the transcription of adhesion molecules Nectin-3 and NRXN3. Cancer Commun. 2020;40(12):721–737. doi: 10.1002/cac2.12104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Fang Q, Strand A, Law W, Faca VM, Fitzgibbon MP, Hamel N, et al. Brain-specific proteins decline in the cerebrospinal fluid of humans with Huntington disease. Mol Cell Proteomics. 2009;8(3):451–466. doi: 10.1074/mcp.M800231-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Non-invasive biomarkers for early diagnosis of pancreatic cancer risk: metabolite genomewide association study based on the KCPS-II cohort

Youngmin Han

Keum Ji Jung

Unchong Kim

Chan Il Jeon

Kwangbae Lee

Sun Ha Jee

Abstract

Background

Methods

Results

Conclusions

Supplementary Information

Introduction

Materials and methods

Study population

Smoking history

Metabolome analysis

Non-targeted metabolomics

UHPLC-MS/MS analysis

Identification of metabolites

Genotyping

Statistical analysis

Results

Anthropometric and clinical/biochemical characteristics at the baseline

Table 1.

Discriminant metabolites between the pancreatic cancer incidence and control groups

Table 2.

Metabolite-genomewide association analysis

Table 3.

Network analysis between metabolomic biomarkers and clinical/biochemical indicators

Fig. 1.

Mediation and moderation analyses

Fig. 2.

Evaluation of the predictive power as a biomarker for pancreatic cancer

Fig. 3.

Discussion

Supplementary Information

Acknowledgements

Abbreviations

Author contributions

Funding

Availability of data and materials

Declarations

Ethical approval and consent to participate

Consent for publication

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases