CONTEXT SUMMARY
Key Objective
It has been challenging to accurately determine individual risk levels for pancreatic cancer when considering only the main effects of known risk factors. Identifying interaction risk effects between genetic variants and smoking behavior may provide more precise stratification of high-risk subjects.
Knowledge Generated
We utilize the UK Biobank to identify one genome-wide significant variant-smoking interaction effect and several other loci with suggestive evidence of interaction effects. Functional annotation analysis and region-based replication studies support the original discoveries. Risk models accounting for interaction effects show improved stratification performance compared to approaches utilizing only genetic main effects.
Relevance
We demonstrate that accounting for interaction effects between genetic and non-genetic risk factors can result in more precise assessments of individual pancreatic cancer risk. Improved identification of high-risk subjects could lead to both more effective preventative measures and earlier diagnosis strategies in pancreatic cancer, a deadly disease often diagnosed in late stages.
Purpose:
Pancreatic cancer is a deadly disease most often diagnosed in late stages. Identification of high-risk subjects could both contribute to preventative measures and help diagnose the disease at earlier timepoints. However, known risk factors, assessed independently, are currently insufficient for accurately stratifying patients. We utilize large-scale data from the UK Biobank (UKB) to identify genetic variant-smoking interaction effects and show their importance in risk assessment.
Methods:
We draw data from 15,086,830 genetic variants and 315,512 individuals in the UKB. There are 765 cases of pancreatic cancer. Crucially, robust resampling corrections are utilized to overcome well-known challenges in hypothesis testing for interactions. Replication analysis is conducted in two independent cohorts totaling 793 cases and 570 controls. Integration of functional annotation data and construction of polygenic risk scores demonstrate the additional insight provided by interaction effects.
Results:
We identify the genome-wide significant variant rs77196339 on chromosome 2 (per minor allele odds ratio in never smokers=2.31, 95% confidence interval (1.69, 3.15); per minor allele odds ratio in ever-smokers=0.53, 95% confidence interval (0.30, 0.91), ) as well as eight other loci with suggestive evidence of interaction effects (). The rs77196339 region association is validated () in the replication sample. Polygenic risk scores (PRS) incorporating interaction effects show improved discriminatory ability over PRS of main effects alone.
Conclusions:
This study of genome-wide germline variants identified smoking to modify the effect of rs77196339 on pancreatic cancer risk. Interactions between known risk factors can provide critical information for identifying high-risk subjects given the relative inadequacy of models considering only main effects, as demonstrated in polygenic risk scores. Further studies are necessary to advance towards comprehensive risk prediction approaches for pancreatic cancer.
INTRODUCTION
Pancreatic cancer (PC) is the third leading cause of cancer-related deaths in the United States.1 One of the main difficulties in treating PC is the challenge of late diagnosis. Approximately half of cases present as metastatic, and the five-year overall survival of these patients is less than 5%. Only 15–20% of patients are diagnosed with resectable disease, and five-year overall survival rates after R0 surgical resection are approximately 30% and 10% for node-negative and node-positive disease, respectively.1,2
It is thus of great interest to precisely identify populations at high risk of pancreatic cancer, so that appropriate prevention and surveillance strategies can be developed. Many studies have identified factors that contribute to increased risk of PC, including germline genetic variants, cigarette smoking, diabetes, chronic pancreatitis, alcohol use, and obesity.3 However, it has been challenging to integrate all the aforementioned factors into a model that can accurately stratify the general population.4 The lack of robust prediction approaches suggests both the existence of additional risks that have not been identified as well as more complex interplay between known characteristics. One particular effect that has been underexplored is the interaction between germline variants and smoking.5
The genetic etiology of PC has been investigated in many previous studies, and estimates of heritability range from 21–36%.6 Family-based studies have identified pathogenic variants in a number of genes, for example DNA mismatch repair genes and BRCA1/2, to be associated with PC.7–9 Genome-wide studies have also identified dozens of risk loci in both European and Asian cohorts, with distinct heterogeneity between the populations.10–12 However, known susceptibility regions explain only a small fraction of the estimated heritability.13 This missing heritability could be explained by rare variants, variants with small effects that are difficult to detect, or interaction effects, among other possible causes.14
The effect of smoking on PC risk has also been investigated extensively in previous literature. For example, current smokers are estimated to possess an 80% increased risk of pancreatic cancer compared to never smokers.15 A number of genetic loci have been associated with smoking, including most notably nicotinic receptor genes on chromosome 15.16 As with PC, variants associated with smoking traits explain only a fraction of the estimated heritability of these traits.16
Few studies have investigated the interaction between germline single nucleotide polymorphisms (SNPs) and smoking. Validated interaction risk loci are scarce,17 despite the strong, well-known interplay among genetic factors, smoking, and PC. One previous investigation using data from the PanScan and PanC4 cohorts identified a single locus near the TMEM163 gene on 2q21.3.5 No other study has added findings to the genome-wide association study (GWAS) Catalog.17,18 The difficulty of testing for gene-environment interaction effects has been discussed extensively in both the genetics and statistics literature.19–21 Standard methods similar to the approaches used for GWAS are known to produce invalid hypothesis testing.21
To overcome the aforementioned challenges, this work harnesses large-scale data from one of the largest publicly available genetic compendiums, the UK Biobank (UKB),22 to conduct a genome-wide interaction study, identifying genetic variants that interact with smoking to perturb pancreatic cancer risk. We integrate varied functional annotation data to characterize the roles of risk variants. Interaction polygenic risk scores for PC are introduced and constructed to demonstrate the advantages of considering interactions as opposed to main effects only.
METHODS
Study population
The design of the UK Biobank, a database containing approximately 500,000 individuals, has been described at length in previous publications.22 We performed a variety of quality control measures (Data Supplement, Supplementary Methods). Only individuals with white British ancestry were included to avoid population stratification. The final total sample size was n=315,512 with full covariate data on age, sex, and smoking status. Of these 315,512, we identified 765 individuals with pancreatic cancer. Further information about these subjects, including the distributions of other risk factors such as body mass index, are available in the Data Supplement (Data Supplement, Table S1).
Replication analysis was conducted in 793 cases and 570 controls recruited at the University of Texas MD Anderson Cancer Center for the PanScan II10 and PanC423 studies. Both studies have been described in past literature, and we utilized post-processed data in the forms previously reported.10,23
Genotype and smoking data
Standard filtering was also applied to biobank genotype data. Variants with missingness greater than 5%, imputation score less than 0.5, or Hardy-Weinberg equilibrium p-values less than were excluded. We then only retained variants with a minor allele frequency greater than 0.1%. We only analyzed the variants passing the 0.1% frequency threshold to avoid the unstable results and spurious correlations that can result when performing interaction testing with very rare variants.24 The final number of analyzed SNPs after applying all of the filters in this paragraph was 15,086,830.
Smoking information was abstracted from the UK Biobank questionnaire and was coded as ever vs. never smoked. This simplistic coding was chosen as more complex representations could lead to bias.21 Further details are available in the Data Supplement.
Detection of variant-smoking interactions
We utilized a conventional logistic regression model to detect modifiers of smoking risk. Covariates were included for the first ten principal components, age, gender, ever-smoker, a SNP, and the SNP by ever-smoker interaction term. The SNP was coded under an additive genetic model. We started by performing a standard one degree of freedom Wald test on the interaction term only for all variants across the genome.
A major difficulty in detection of interaction effects compared to main effects is the addition of the interaction coefficient in the model. To properly estimate standard errors and produce valid p-values, we corrected results using the bootstrap inference with corrected sandwich (BICS) approach (see Data Supplement) designed specifically for genome-wide interaction studies.25 The BICS method was performed only on variants with an initial standard regression p-value of less than 0.01. Only initial p-values less than 0.01 were corrected as the bootstrapping procedure is computationally expensive. The standard genome-wide threshold of was used to declare significance.
Integration of functional annotation data
We integrated dozens of publicly available genetic annotations to characterize the possible functional behaviors of variants surrounding identified risk loci. These annotations included gene expression regulation annotations such as histone acetylation marks from the ENCODE project.26 We also investigated conservation measures such as evolutionary selection scores. Individual scores under each category were summarized in annotation principal components, with higher values indicating more functionality.27 Expression quantitative trait locus (eQTL) data was queried from the Genotype-Tissue Expression (GTEx) project.28
Interaction polygenic risk scores
We first created genetic main effect polygenic risk scores for subjects using previously-published models. Specifically, a baseline PRS that utilized 48 variants pulled from four previous studies was chosen for its aggregation of multiple analyses.29–32 Weights for each SNP were taken from a previous publication.4 For each subject , the main effects PRS (mPRS) for the subject was calculated as the sum of weighted counts for all 48 SNPs, where was the number of risk alleles that subject possessed for variant , and was the weight for variant .
A score with interaction effects was further computed by selecting the most significant variant at each locus with BICS and combining them with three additional interaction SNPs identified in the GWAS catalog that were also in our sample. The interaction PRS (intPRS) for subject i was calculated as , where was a binary indicator taking the value 1 if subject ever smoked, and was the interaction regression coefficient for SNP . An overall PRS was calculated as the sum of mPRS and intPRS. All scores were evaluated in a sample with 1:10 case control ratio, with all cases and selected controls matched by age, to allow for comparison to prior work. 4 Both the mPRS and intPRS were also standardized to have mean 0 and variance 1 as in previous literature.4 As a sensitivity analysis, we also created an interaction score allowing to be an indicator for current smoker instead of ever smoker.
Region-based replication analysis
We performed replication analysis with the variant-Set Test for Association using Annotation infoRmation (STAAR).33 Region-based analysis was performed due to analysis showing a lack of power for replicating low frequency variants (Data Supplement, Supplementary Methods). More details regarding replication analysis are available in the Data Supplement.
RESULTS
Genome-wide analysis
We performed the BICS correction for 84,251 variants with an initial standard interaction p-value less than 0.01, and after correction we identified the variant rs77196339 at 2q32 to be a PC risk variant with effect modified by smoking at a genome-wide significant level (Table 1). This SNP is an intergenic variant with minor allele frequency of 3.6%, and the nearest downstream genes include NABP1 and TMEFF2. Another 14 variants demonstrate association at , including a cluster of introns at the MSRA gene on chromosome 8 (Fig 1A, Table 1, Table S2).
Table 1:
Top SNP by smoking interaction effects identified after BICS correction. SNP, single nucleotide polymorphism; Chr:BP, chromosome and base pair in GRCh37 coordinates; MAF, minor allele frequency; OR, odds ratio; Pint, p-value for interaction effect
| SNP | Chr:BP | MAF | OR, Never-Smokers | OR, Ever-Smokers | Pint |
|---|---|---|---|---|---|
| rs77196339 | 2:192442474 | 0.04 | 2.31 (1.69,3.15) | 0.53 (0.3,0.91) | |
| rs145440394 | 10:79833802 | 0.08 | 1.65 (1.26,2.15) | 0.6 (0.4,0.9) | |
| rs117732931 | 14:78999105 | 0.05 | 1.75 (1.3,2.35) | 0.58 (0.36,0.93) | |
| rs138130320 | 6:84011804 | 0.01 | 3.21 (1.62,6.34) | 0.25 (0.05,1.3) | |
| rs12715476 | 3:5548037 | 0.41 | 0.76 (0.64,0.91) | 1.25 (1.01,1.56) |
Figure 1:

Genome-wide results for testing of SNP by smoking interaction. A. Manhattan plot with even chromosomes in green and odd chromosomes in gold. Each dot is one SNP. Horizontal lines show and . Only 84,251 SNPs with original are shown for clarity. Labels of chromosome:position show location of SNPs with for interaction effect. B. QQ-plot with BICS results in blue and standard logistic regression Wald test in red. Each dot is one SNP. The standard (red) testing results are quite deflated, indicating p-values are higher (less significant) than they should be. Lines diverge at x=2, corresponding to on the log scale, which is the point where we began to apply BICS.
The diagnostic QQ-plot (Fig 1B) clearly shows the necessity of the BICS correction in detecting interaction effects. Each dot represents the interaction p-value for one SNP, and the dots would fall along the solid diagonal line if they were uniformly distributed as expected. The uncorrected p-values are deflated and fall far below the line, indicating that standard logistic regression tests return p-values that are too large. The BICS line separates from the uncorrected line at the value of 2 on the x-axis, representing our correction threshold of p=0.01 on the log scale, and the BICS p-values remain much more uniform.
Genetic variant-smoking interaction effects in the rs77196339 region were associated () with pancreatic cancer in the STAAR analysis of the replication cohort (Data Supplement, Table S3). The individual rs77196339 interaction effect also pointed in the correct direction in both replication cohorts, although the individual variant p-value did not reach statistical significance. As mentioned previously, power to replicate this low minor allele frequency variant was lacking (Data Supplement, Supplementary Methods and Table S4).
Sensitivity analysis was performed in the original UKB discovery cohort to determine whether adjustment for other non-genetic risk factors would affect the rs77196339 results. Estimates of the interaction effect and p-value remained remarkably stable (Data Supplement, Table S5) in these additional models.
Epigenetic and evolutionarily conserved annotations
We integrated genome-wide functional annotations to better understand possible mechanisms of risk for all variants demonstrating association at (Fig 2, Fig S1-S3). For rs77196339, we can see that this variant and many nearby variants show strong evidence of regulating gene expression activity. In Figure 2A, the non-circle points show histone mark signals across the region of interest, with the dotted vertical line indicating rs77196339. The y-axis shows the percentile of these values across the genome. We can see some SNPs in the region possess H3K4me1 values that are above the 98th percentile; this mark is often associated with gene enhancers. Values of H3K27ac and H3K4me3, which are also associated with higher transcriptional activity, are elevated as well.
Figure 2:

Functional annotation results for each SNP in the region of rs77196339. A. Epigenetic annotations including maximum ENCODE H3K27ac level over 14 cell lines, maximum H3K4me1 level, and maximum H3K4me3 level. Higher values strongly imply the region regulates transcription processes. Red dot is epigenetic principal component integrating three plotted individual scores and four other epigenetic annotations.27 Dotted vertical represents top SNP location. B. Evolutionary conservation annotations including primate, mammal, and vertebrate PhyloP scores as well as primate and mammal phastCons scores. Higher values imply the region is more conserved across each phylogenetic tree. Red dot is evolutionary conservation principal component integrating five plotted individual scores and three similar conservation annotations.27
The larger red circles are epigenetic annotation principal components calculated using the three aforementioned annotations as well as four other histone marks.33 While this integrated measure falls at approximately the median for rs77196339, values in the region are often in the top quartile, suggesting the possibility that rs77196339 could simply be tagging the effect of another variant. The STAAR replication analysis also supports the possibility that interaction effects are spread across the region.
Analysis using the GTEx database28 version 8 reveals that rs77196339 is a cis-eQTL of TMEFF2 in adipose tissue (Data Supplement; Table S6). Other strongly associated variants also act as eQTLs of nearby genes (Data Supplement; Table S6). These findings are again consistent with epigenetic annotation data.
Annotations describing levels of evolutionary conservation also reinforce the theme that the rs77196339 region supports important functional mechanisms (Fig 2B). The non-circle points show the percentiles of phastCons and PhyloP scores, which measure the amount of evolutionary activity occurring at each location.34,35 Conserved regions across different species are considered more functional and important.36 The percentiles across conservation scores are more mixed than the epigenetic scores, with some very low values, especially those from phastCons. However, the principal component integrating eight total conservation scores shows some larger values in the region, although rs77196339 itself falls around the median.
Polygenic risk scores
We further assessed the clinical impact of studying SNP-smoking interactions by creating polygenic risk scores with and without interaction effects. An mPRS using previously-published variants and weights4 produced an area under the curve (AUC) of 0.59 (95% CI 0.57–0.61). While exact predictive levels may vary from cohort to cohort due to differences in sample selection or other data cleaning steps, this main effects AUC falls broadly in line with summaries that other publications have reported for pancreatic cancer.29 We then added mPRS and intPRS for a combined score. The AUC of the combined risk score was 0.61 (95% CI 0.59–0.63; p-value for difference37 of AUC values = 0.03), succinctly demonstrating the ability of interaction data to provide additional helpful information in stratifying high risk subjects (Fig 3A). The percentage change is relatively large given that, in practice, the AUC would not fall below 0.5 in developing a risk score. Sensitivity analysis that constructed intPRS with a current smoker variable, as opposed to an ever smoker variable, returned roughly the same result (Data Supplement, Fig S4), as did adding other non-genetic risk factors to the score (Data Supplement, Fig S5-S9).
Figure 3:

Polygenic risk score performance. A. The mPRS area under the curve is broadly similar to the results reported by other polygenic risk score literature in pancreatic cancer. The combined PRS consisting of intPRS and mPRS improves the area under the curve compared to mPRS alone. Results are for 1:10 case:control sample. B. In the entire UK Biobank dataset, the five quintiles of the combined PRS effectively stratify the subjects into five distinct subgroups with difference incidence rates of PC.
The combined main effects and interaction PRS also clearly demonstrates the ability to discriminate between high and low-risk subjects in time-to-event analysis. Cumulative incidence curves of UKB subjects stratified by the five quintiles of PRS show a clear difference in time-to-pancreatic cancer once incidence begins rising after age 65 (Fig 3B). When the five quintiles are inserted as categorical covariates into a Cox proportional hazards model, adjusted for age, sex, and smoking status as in the main genome-wide analysis, the hazard ratio comparing the top quintile to the lowest quintile is 3.76 (95% CI 2.83–5.00 p<0.001).
DISCUSSION
Improving the outcomes of pancreatic cancer patients depends heavily on both developing novel early detection strategies and identifying new therapeutic approaches. Efforts towards the former goal have identified many risk factors that are known to increase PC risk as main effects. That is, these factors possess a certain effect that exists both with and without the presence of other components, e.g. smoking presents a risk in both diabetic and non-diabetic subjects. However, existing knowledge falls far from the level necessary to implement broader stratification programs such as population-wide screening programs.4 One area that has not received much study is the identification of interaction effects between multiple risk factors.
Our analysis of one of the largest publicly available genetic compendiums empirically demonstrates the difficulty of detecting genetic interaction effects (Fig 1B), a challenge that has been discussed at length in the literature. We identify the intergenic variant rs77196339 to possess an interaction effect with smoking at the genome-wide significant level, and various other loci demonstrate suggestive evidence of interaction association at . Integration of varied functional annotation data suggests that the region around rs77196339 is moderately evolutionarily conserved and highly involved in transcription regulation activities. These annotation data are often more reliable than eQTL databases when the variant of interest is rare, which greatly reduces power to identify significant eQTL associations. rs77196339 could also simply be tagging the true causal variant. The nearest gene to rs77196339 is NAPB1, which is a single-stranded DNA binding protein with critical roles in DNA damage response and maintaining genomic stability.38 It is established that genomic instability is a characteristic of pancreatic cancer,39 and smoking has been linked to genomic instability as well.40
Interpretation of interaction effects is generally difficult given the many possible mechanisms that could exist. The minor allele of rs77196339 shows an odds ratio greater than 1 in non-smokers and less than 1 in smokers. This difference could suggest a mechanism that is deleterious in general but is modified to be somewhat protective in the presence of the many carcinogenic processes introduced by smoking. Further experimental studies are needed to explore and confirm the precise processes at this locus that interact with smoking to perturb PC risk.
The incorporation of interaction effects into polygenic risk scores increases the utility of the combined PRS by an appreciable amount. This increase is expected given that known main effect loci only explain a small fraction of the estimated genetic heritability of PC.13 The missing heritability could be attributable to a number of factors, but interaction effects offer a highly plausible possibility. The increase in PRS performance using interaction effects from only two studies suggests the potential of more improvement in pancreatic cancer PRS as more studies explore interaction effects with smoking and other risk factors.
There are some limitations to our study that should be acknowledged. First, the numbers of pancreatic cancer cases in the UKB and replication datasets are relatively small compared to genetic studies of some other cancer types. This smaller sample size reduces power to detect interaction effects with less common variants (as shown in Table S4); sample sizes and low frequencies could explain why rs77196339 was not discovered in a previous interaction study,5 which identified a different chromosome 2 locus. Additionally, we modeled smoking behavior as a simple binary variable. More complex specifications are possible,5 and differences in categorizing types of smokers could also contribute to differences in results across studies. Finally, it is still necessary to perform experimental studies to further characterize the risk mechanisms that were identified.
In summary, we performed a genome-wide SNP-smoking interaction study in the UKB, which is one of the largest available databases with information on all three of germline genetic data, pancreatic cancer subjects, and smoking status. We utilized a robust testing correction to identify one genome-wide significant interaction at an intergenic SNP on chromosome 2 near NABP1. Functional annotation data provide evidence that this region regulates gene expression, although further studies are necessary to determine the exact processes involved. Integration of interaction effects into polygenic risk scores demonstrates that identification of genetic interaction effects can potentially increase ability to precisely stratify high-risk subjects.
Supplementary Material
ACKNOWLEDGEMENTS
This research has been conducted using the UK Biobank Resource under application number 73569.
Support:
The authors gratefully acknowledge the support of National Cancer Institute grant P30 CA016672.
References:
- 1.Siegel RL, Miller KD, and Jemal A (2019). Cancer statistics, 2019. CA Cancer J. Clin 69, 7–34. [DOI] [PubMed] [Google Scholar]
- 2.Allen PJ, Kuk D, Fernandez-del Castillo C, Basturk O, Wolfgang CL, Cameron JL, Lillemoe KD, Ferrone CR, Morales-Oyarvide V, and He J (2017). Multi-institutional validation study of the American Joint Commission on Cancer changes for T and N staging in patients with pancreatic adenocarcinoma. Ann. Surg 265, 185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Midha S, Chawla S, and Garg PK (2016). Modifiable and non-modifiable risk factors for pancreatic cancer: A review. Cancer Lett 381, 269–277. [DOI] [PubMed] [Google Scholar]
- 4.Sharma S, Tapper WJ, Collins A, and Hamady ZZ (2022). Predicting pancreatic cancer in the UK Biobank cohort using polygenic risk scores and diabetes mellitus. Gastroenterology 162, 1665–1674. e1662. [DOI] [PubMed] [Google Scholar]
- 5.Mocci E, Kundu P, Wheeler W, Arslan AA, Beane-Freeman LE, Bracci PM, Brennan P, Canzian F, Du M, and Gallinger S (2021). Smoking modifies pancreatic cancer risk loci on 2q21. 3. Cancer Res 81, 3134–3143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Afghani E, and Klein AP (2022). Pancreatic Adenocarcinoma: Trends in Epidemiology, Risk Factors, and Outcomes. Hematology/Oncology Clinics 36, 879–895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Childs EJ, Mocci E, Campa D, Bracci PM, Gallinger S, Goggins M, Li D, Neale RE, Olson SH, and Scelo G (2015). Common variation at 2p13. 3, 3q29, 7p13 and 17q25. 1 associated with susceptibility to pancreatic cancer. Nat. Genet 47, 911–916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lucas AL, Frado LE, Hwang C, Kumar S, Khanna LG, Levinson EJ, Chabot JA, Chung WK, and Frucht H (2014). BRCA1 and BRCA2 germline mutations are frequently demonstrated in both high‐risk pancreatic cancer screening and pancreatic cancer cohorts. Cancer 120, 1960–1967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hu C, Hart SN, Polley EC, Gnanaolivu R, Shimelis H, Lee KY, Lilyquist J, Na J, Moore R, and Antwi SO (2018). Association between inherited germline mutations in cancer predisposition genes and risk of pancreatic cancer. JAMA 319, 2401–2409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wolpin BM, Rizzato C, Kraft P, Kooperberg C, Petersen GM, Wang Z, Arslan AA, Beane-Freeman L, Bracci PM, and Buring J (2014). Genome-wide association study identifies multiple susceptibility loci for pancreatic cancer. Nat. Genet 46, 994–1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wu C, Miao X, Huang L, Che X, Jiang G, Yu D, Yang X, Cao G, Hu Z, and Zhou Y (2012). Genome-wide association study identifies five loci associated with susceptibility to pancreatic cancer in Chinese populations. Nat. Genet 44, 62–66. [DOI] [PubMed] [Google Scholar]
- 12.Campa D, Rizzato C, Bauer AS, Werner J, Capurso G, Costello E, Talar-Wojnarowska R, Jamroziak K, Pezzilli R, and Gazouli M (2013). Lack of replication of seven pancreatic cancer susceptibility loci identified in two Asian populations. Cancer Epidemiol. Biomarkers Prev 22, 320–323. [DOI] [PubMed] [Google Scholar]
- 13.Zhang YD, Hurson AN, Zhang H, Choudhury PP, Easton DF, Milne RL, Simard J, Hall P, Michailidou K, and Dennis J (2020). Assessment of polygenic architecture and risk prediction based on common variants across fourteen cancers. Nat. Commun 11, 3353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Génin E (2020). Missing heritability of complex diseases: case solved? Hum. Genet 139, 103–113. [DOI] [PubMed] [Google Scholar]
- 15.Yuan C, Morales-Oyarvide V, Babic A, Clish CB, Kraft P, Bao Y, Qian ZR, Rubinson DA, Ng K, and Giovannucci EL (2017). Cigarette smoking and pancreatic cancer survival. J. Clin. Oncol 35, 1822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Xu K, Li B, McGinnis KA, Vickers-Smith R, Dao C, Sun N, Kember RL, Zhou H, Becker WC, and Gelernter J (2020). Genome-wide association study of smoking trajectory and meta-analysis of smoking status in 842,000 individuals. Nat. Commun 11, 5302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, McMahon A, Morales J, Mountjoy E, and Sollis E (2019). The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47, D1005–D1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, Junkins H, McMahon A, Milano A, and Morales J (2017). The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res 45, D896–D901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hutter CM, Mechanic LE, Chatterjee N, Kraft P, Gillanders EM, and Tank NGET (2013). Gene‐environment interactions in cancer epidemiology: a National Cancer Institute Think Tank report. Genet. Epidemiol 37, 643–657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Aschard H (2016). A perspective on interaction effects in genetic association studies. Genet. Epidemiol 40, 678–688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zhang M, Yu Y, Wang S, Salvatore M, Fritsche G, L., He Z, and Mukherjee B (2020). Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions. Stat. Med 39, 1675–1694. [DOI] [PubMed] [Google Scholar]
- 22.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, and O’Connell J (2018). The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Klein AP, Wolpin BM, Risch HA, Stolzenberg-Solomon RZ, Mocci E, Zhang M, Canzian F, Childs EJ, Hoskins JW, and Jermusyk A (2018). Genome-wide meta-analysis identifies five new susceptibility loci for pancreatic cancer. Nat. Commun 9, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Yang T, Chen H, Tang H, Li D, and Wei P (2019). A powerful and data‐adaptive test for rare‐variant–based gene‐environment interaction analysis. Stat. Med 38, 1230–1244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Sun R, Carroll RJ, Christiani DC, and Lin X (2018). Testing for gene–environment interaction under exposure misspecification. Biometrics 74, 653–662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.de Souza N (2012). The ENCODE project. Nat. Methods 9, 1046–1046. [DOI] [PubMed] [Google Scholar]
- 27.Li X, Li Z, Zhou H, Gaynor SM, Liu Y, Chen H, Sun R, Dey R, Arnett DK, and Aslibekyan S (2020). Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat. Genet 52, 969–983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Consortium G (2020). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Jia G, Lu Y, Wen W, Long J, Liu Y, Tao R, Li B, Denny JC, Shu X-O, and Zheng W (2020). Evaluating the utility of polygenic risk scores in identifying high-risk individuals for eight common cancers. JNCI Cancer Spectr. 4, pkaa021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Nakatochi M, Lin Y, Ito H, Hara K, Kinoshita F, Kobayashi Y, Ishii H, Ozaka M, Sasaki T, and Sasahira N (2018). Prediction model for pancreatic cancer risk in the general Japanese population. PLoS One 13, e0203386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Molina-Montes E, Coscia C, Gómez-Rubio P, Fernández A, Boenink R, Rava M, Márquez M, Molero X, Löhr M, and Sharp L (2021). Deciphering the complex interplay between pancreatic cancer, diabetes mellitus subtypes and obesity/BMI through causal inference and mediation analyses. Gut 70, 319–329. [DOI] [PubMed] [Google Scholar]
- 32.Galeotti AA, Gentiluomo M, Rizzato C, Obazee O, Neoptolemos JP, Pasquali C, Nentwich M, Cavestro GM, Pezzilli R, and Greenhalf W (2021). Polygenic and multifactorial scores for pancreatic ductal adenocarcinoma risk prediction. J. Med. Genet 58, 369–377. [DOI] [PubMed] [Google Scholar]
- 33.Li X, Li Z, Zhou H, Gaynor SM, Liu Y, Chen H, Sun R, Dey R, Arnett DK, and Aslibekyan S (2020). Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat. Genet 52, 969–983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, and Richards S (2005). Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15, 1034–1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Pollard KS, Hubisz MJ, Rosenbloom KR, and Siepel A (2010). Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20, 110–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Li X, Yung G, Zhou H, Sun R, Li Z, Hou K, Zhang MJ, Liu Y, Arapoglou T, and Wang C (2022). A multi-dimensional integrative scoring framework for predicting functional variants in the human genome. Am. J. Hum. Genet 109, 446–456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.DeLong ER, DeLong DM, and Clarke-Pearson DL (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 837–845. [PubMed] [Google Scholar]
- 38.Richard DJ, Bolderson E, Cubeddu L, Wadsworth RI, Savage K, Sharma GG, Nicolette ML, Tsvetanov S, McIlwraith MJ, and Pandita RK (2008). Single-stranded DNA-binding protein hSSB1 is critical for genomic stability. Nature 453, 677–681. [DOI] [PubMed] [Google Scholar]
- 39.Campbell PJ, Yachida S, Mudie LJ, Stephens PJ, Pleasance ED, Stebbings LA, Morsberger LA, Latimer C, McLaren S, and Lin M-L (2010). The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature 467, 1109–1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Locken-Castilla A, Pacheco-Pantoja EL, Rodríguez-Brito F, May-Kim S, López-Rivas V, and Ceballos-Cruz A (2019). Smoking index, lifestyle factors, and genomic instability assessed by single-cell gel electrophoresis: a cross-sectional study in subjects from Yucatan, Mexico. Clin. Epigenetics 11, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
