Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Oct 1.
Published in final edited form as: Gut. 2016 Aug 2;66(10):1739–1747. doi: 10.1136/gutjnl-2016-311622

Germline variation in inflammation-related pathways and risk of Barrett’s esophagus and esophageal adenocarcinoma

Matthew F Buas 1, Qianchuan He 1, Lisa G Johnson 1, Lynn Onstad 1, David M Levine 2, Aaron P Thrift 3, Puya Gharahkhani 4, Claire Palles 5, Jesper Lagergren 6,7, Rebecca C Fitzgerald 8, Weimin Ye 9, Carlos Caldas 10, Nigel C Bird 11, Nicholas J Shaheen 12, Leslie Bernstein 13, Marilie D Gammon 14, Anna H Wu 15, Laura J Hardie 16, Paul D Pharoah 17, Geoffrey Liu 18, Prassad Iyer 19, Douglas A Corley 20,21, Harvey A Risch 22, Wong-Ho Chow 23, Hans Prenen 24, Laura Chegwidden 25, Sharon Love 26, Stephen Attwood 27, Paul Moayyedi 28, David MacDonald 29, Rebecca Harrison 30, Peter Watson 31, Hugh Barr 32, John deCaestecker 33, Ian Tomlinson 5, Janusz Jankowski 34, David C Whiteman 35, Stuart MacGregor 4, Thomas L Vaughan 1,36, Margaret M Madeleine 1,36,*
PMCID: PMC5296402  NIHMSID: NIHMS844114  PMID: 27486097

Abstract

Esophageal adenocarcinoma (EA) incidence has risen sharply in Western countries over recent decades. Local and systemic inflammation, operating downstream of disease-associated exposures, is considered an important contributor to EA pathogenesis. Several risk factors have been identified for EA and its precursor, Barrett’s esophagus (BE), including symptomatic reflux, obesity, and smoking. The role of inherited genetic susceptibility remains an area of active investigation. To explore whether germline variation related to inflammatory processes influences susceptibility to BE/EA, we used data from a genome-wide association study (GWAS) of 2,515 EA cases, 3,295 BE cases, and 3,207 controls. Our analysis included 7,863 single nucleotide polymorphisms (SNPs) in 449 genes assigned to five pathways: cyclooxygenase (COX), cytokine signaling, oxidative stress, human leukocyte antigen, and NFκB. A principal components-based analytic framework was employed to evaluate pathway-level and gene-level associations with disease risk. We identified a significant signal for the COX pathway in relation to BE risk (P=0.0059, FDR q=0.03), and in gene-level analyses found an association with MGST1 (microsomal glutathione-S-transferase 1; P=0.0005, q=0.005). Assessment of 36 MGST1 SNPs identified 14 variants associated with elevated BE risk (q<0.05). Of these, four were subsequently confirmed (P<5.5 × 10−5) in a meta-analysis encompassing an independent set of 1,851 BE cases and 3,496 controls. Three of these SNPs (rs3852575, rs73112090, rs4149204) were associated with similar elevations in EA risk. This study provides the most comprehensive evaluation of inflammation-related germline variation in relation to risk of BE/EA, and suggests that variants in MGST1 influence disease susceptibility.

Introduction

The incidence of esophageal adenocarcinoma (EA) has risen rapidly over recent decades in Western countries [1, 2]. EA typically arises within a metaplastic precursor epithelium known as Barrett’s esophagus (BE) [3]. Established risk factors for EA and BE include symptomatic gastroesophageal reflux disease (GERD), abdominal adiposity, tobacco smoking, European ancestry, and male sex [3, 4, 5, 6]. A prevailing conceptual model has linked chronic inflammation and genomic instability to EA pathogenesis [3]. Several exposures associated with elevated disease risk, such as GERD, obesity, and smoking, increase levels of local and systemic inflammation, while use of non-steroidal anti-inflammatory drugs (NSAIDs) and statins (which may have anti-inflammatory properties), has been associated with reduced risk [7, 8, 9]. It remains poorly understood, however, whether and to what extent inherited genetic variation in specific genes and pathways implicated in inflammatory signaling may modulate disease susceptibility.

A biologic link between chronic inflammation and cancer risk has long been appreciated [10, 11]. Inflammation may act at multiple stages of disease development to disrupt tissue homeostasis, induce aberrant proliferative responses, modulate the tumor microenvironment, and compromise immune surveillance [12, 13, 14]. Inflammatory physiological changes such as oxidative stress are known to exert downstream genotoxic effects [15], and when sustained over extended periods, can promote the emergence of cancer-initiating mutations. In the esophagus, long-term exposure to gastric acid or bile salts results in the release of pro-inflammatory cytokines (e.g., interleukin-8), activation of nuclear factor kappa-light-chain-enhancer of activated B cells (NF-kB) and cyclooxygenase-2 (COX2), alterations in gene expression, and direct tissue damage to the squamous epithelium [16, 17, 18]. Cigarette smoking can also expose the esophagus to deleterious toxins while simultaneously inducing systemic inflammatory responses based on activation of cytokine signaling, NFκB activation, and COX pathway stimulation [19, 20, 21]. Abdominal adiposity and obesity have been associated with elevated circulating levels of pro-inflammatory mediators such as tumor necrosis factor-α (TNFα), C-reactive protein (CRP), interleukin-6 (IL-6), and leptin [22, 23]. These elevated inflammation markers are likely consequences of adipose tissue inflammation. Inflammation may therefore sustain pathogenesis at several points and through multiple pathways, from development of early lesions through cancer progression.

Recent large-scale GWAS have provided comprehensive assessments of inherited genetic susceptibility to BE and EA [24, 25, 26, 27, 28]. Novel associations have been identified with variants in or near several transcription factors implicated in embryonic esophageal development, a transcriptional co-activator, and the human leukocyte antigen (HLA) region. It remains likely, however, that additional loci that did not satisfy the commonly-used, stringent statistical threshold (p<5×10−8) may be involved in modifying disease risk. In this regard, pathway-based analytic methods can offer significant advantages over conventional genome-wide analyses. Pathway approaches simultaneously reduce the number of statistical comparisons and increase power by aggregating large numbers of low-magnitude signals [29]; importantly, such methods allow for the systematic analysis of coherent biological processes most likely to be implicated in disease etiology.

Given the central role of inflammation in BE and EA pathogenesis, we examined genetic variation in five inflammation-related pathways—COX, cytokine signaling, oxidative stress, HLA, and NFκB—based on a novel principal components-based pathway analysis framework. Using genotyping data from the International Barrett’s Esophagus and Adenocarcinoma Consortium (BEACON) GWAS of 2,515 EA cases, 3,295 BE cases, and 3,207 controls, we selected 7,863 SNPs in 449 genes and assessed associations with risks of BE and EA in a pre-specified tiered fashion, first at the pathway level, next at the gene level, and ultimately at the SNP level.

Methods

Study population and SNP genotyping

The BEACON GWAS included men and women diagnosed with EA or BE, and control participants pooled from 14 individual studies conducted in Western Europe, Australia, and North America over the past 20 years. Detailed study population characteristics and genotyping protocols have been published [24]. The current analysis employed a pooled dataset [30] that included participants of European ancestry from the BEACON GWAS, additional BE and EA patients from the UK Barrett’s Esophagus Gene Study and the UK Stomach and Oesophageal Cancer Study (SOCS), respectively [24], and additional control participants from a hospital-based case-control study of melanoma conducted at the MD Anderson Cancer Center (Houston, TX) [31]. Genotyping of buffy coat or whole blood DNA from all participants was conducted using the Illumina Omni1M Quad platform, in accordance with standard quality control procedures [32]. All participants gave written informed consent, and this project was approved by the ethics review board of the Fred Hutchinson Cancer Research Center. We selected all unrelated participants with <2% missing genotyping calls; thus the final study sample included 2,515 EA cases, 3,295 BE cases, and 3,207 controls. Three control participants were excluded from analyses involving BE cases, because of familial relation to cases.

Selection of genes in inflammation-related pathways

Five pathways implicated in chronic inflammation were selected for analysis: 1) cyclooxygenase (COX) (n=40 genes), 2) pro- and anti-inflammatory cytokines (n=198 genes), 3) oxidative stress (n=117 genes), 4) HLA (32 genes), and 5) NF-kB (n=125 genes). Selected genes in each of these pathways (Table S1) were identified based on an extensive survey of the prior literature on inflammation in cancer and EA pathogenesis [12, 33, 34, 35, 36, 37], and as described in public databases (eg. KEGG, Biocarta).

SNP selection

SNPs selected for this study are located in or near (+/− 2.0 kilobases) the genes chosen for analysis. We excluded from consideration SNPs that failed Illumina quality measures or standard quality control procedures [32]. Specifically, SNPs were excluded if any of the following criteria were satisfied: i) Illumina GenTrain score < 0.6 or cluster separation < 0.4; ii) >5% missing call rate over included samples; iii) discordant genotype calls in any pair of duplicate study samples; iv) Mendelian error in either one of the HapMap QC trios or the small number of families identified in the BEACON data; v) significant departure from Hardy-Weinberg Equilibrium (P<10−4); and vi) minor allele frequency (MAF) <1%. Imputation of missing values for genotyped SNPs was conducted using SHAPEIT [38]. After imposing the above filters, we identified all available Omni1M SNPs located within the selected genes of each inflammation-related pathway. Segments of 2.0 kilobases of flanking sequence proximal to the gene start sites and distal to the 3′UTRs were also included, based on gene boundaries defined in hg19/GRChB37. No Omni1M SNPs were available for 16 genes initially selected (cytokines: n=14, oxidative stress: n=2) (Table S1). Minor and major alleles were reported throughout using the ‘plus’ strand designation.

Statistical analysis

We examined each of the five inflammation-related pathways using an application of principal components analysis (PCA). We first constructed a genotype matrix comprising all SNPs assigned to the indicated pathway, inclusive of case patients of the selected type (BE or EA) and all control participants. Individual SNP variables, coded as 0, 1, or 2 minor alleles, were standardized across participants to have a mean of zero and standard deviation (SD) of one. The first N principal components (PCs) that captured ≥50% of the genotypic variance of the pathway were selected (a minimum of 3 PCs were included: N≥3). Association between a given pathway and risk of BE or EA was assessed using the likelihood ratio test (LRT). Two logistic regression models were compared: i) a full model containing N pathway-level PCs (PC1,p…PCN,p), age, sex, and the first four PCs derived from ancestry-informative markers (AIM) to account for population stratification (PC1,AIM-PC4,AIM) [30]; and ii) a reduced model containing only age, sex and PC1,AIM-PC4,AIM. HLA loci were excluded from the set of ancestry-informative markers, as described previously [24]. We selected pathways for which the resulting LRT P value was <0.05, after correction for multiple comparisons (n=5) via the false discovery rate method (FDR).

To prioritize genes within a selected pathway for further analysis at the gene level, we examined SNP loading factors within the first pathway-level principal component (PC1,p). SNPs within PC1,p were rank-ordered by the absolute values of their loading coefficients. The first ten genes represented by these rank-ordered SNPs were advanced to gene-level analysis. PCA was conducted for each of these genes using a genotype matrix comprised of all SNPs assigned to the indicated gene; the first N PCs that captured ≥50% of the genotypic variance were selected (a minimum of 3 PCs were included: N≥3). Association between a given gene and risk of BE or EA was assessed as above using the LRT, comparing i) a full model inclusive of the selected gene-level PCs (PC1,g-PCN,g), age, sex, and PC1,AIM-PC4,AIM; and ii) a nested reduced model containing age, sex, and PC1,AIM-PC4,AIM. Multiple comparisons (n=10) were accounted for via the FDR method.

Genes satisfying FDR q<0.05 were selected for additional analysis at the SNP level. Unconditional logistic regression was used to compute odds ratios (ORs) for risk of BE or EA associated with a given SNP variant, under an additive model (per-allele) with adjustment for age, sex, and PC1,AIM-PC4,AIM, and correcting for multiple comparisons via the FDR method. Observed associations were visualized graphically using LocusZoom [39].

Statistical analyses were conducted using STATA/SE version 14 (College Station, TX).

An independent dataset comprised of 1,851 BE patients and 3,496 control participants from the UK, described previously [25], was used for validation studies. Genotyping was performed on the Illumina Human 660W-Quad and Human 1.2M-Duo array platforms. Summary statistics for the associations of 13 genotyped SNPs at the MGST1 locus and risk of BE were extracted and used in a subsequent meta-analysis based on the inverse-variance weighting method [41]. Validation analyses were conducted in R v3.2.1.

Results

Characteristics of study participants

The distributions of demographic and behavioral characteristics among control participants, BE case patients, and EA case patients are shown in Table 1. EA cases were somewhat older and more often male compared to controls and BE cases. The percentage reporting ever having smoked cigarettes was higher among BE and EA cases than among controls, and heavy smoking (45+ pack years) was more prevalent among EA cases. Obesity (BMI 30+) and weekly reflux/heartburn were more prevalent among BE and EA cases than among controls. NSAID use appeared similarly common across the three groups. Relative to controls, a substantially higher percentage of participants with BE and EA were classified as having a high composite “inflammation score”, based on BMI, smoking history, and reflux symptoms.

Table 1.

Study participant characteristics.

Controls# (n=3207) BE (n=3295) EA (n=2515)
N % N % N %
Age (years)
 <50 726 22.6 449 13.7 189 7.6
 50–59 885 27.6 780 23.7 547 21.9
 60–69 963 30.0 1011 30.7 884 35.4
 70+ 633 19.7 1048 31.9 875 35.1
Sex
 Female 880 27.4 806 24.5 320 12.7
 Male 2327 72.6 2489 75.5 2195 87.3
BMI
 <25 786 36.3 425 20.7 245 24.6
 25–29.99 944 43.6 882 42.9 455 45.7
 30–34.99 307 14.2 521 25.3 201 20.2
 35+ 130 6.0 230 11.2 95 9.5
Smoking status
 No 889 40.9 798 33.7 348 24.7
 Yes 1284 59.1 1570 66.3 1062 75.3
Smoking (p–y)a
 None 889 41.3 798 44.5 348 32.8
 <15 358 16.6 320 17.9 156 14.7
 15–29 326 15.1 232 12.9 160 15.1
 30–44 273 12.7 198 11.0 173 16.3
 45+ 309 14.3 244 13.6 225 21.2
NSAID use
 Never 814 44.0 503 42.8 381 46.2
 Ever 1038 56.0 672 57.2 444 53.8
Reflux/heartburnb
 No 1448 80.6 957 49.0 563 56.2
 Yes 349 19.4 996 51.0 438 43.8
Inflammation score
 Low 818 46.2 235 14.3 142 20.9
 Medium 381 21.5 262 16.0 137 20.2
 High 571 32.3 1143 69.7 400 58.9

Numbers do not add to total subjects due to missing data;

#

3 participants were excluded from the control group for comparison to BE case patients due to relatedness;

a

Pack-years,

b

Weekly symptoms

Pathway-level associations with risk of BE or EA

To obtain a top-level, global assessment of the association between germline variation within five selected inflammation-related pathways (COX, cytokine signaling, oxidative stress, HLA, and NFκB) and risk of BE or EA, we employed a PCA-based approach. Based on logistic regression analyses that incorporated a subset of the derived principal components as predictor variables and assessed associations with disease risk, we identified a single significant (P<0.05) pathway-level signal for risk of BE: the COX pathway (P=0.006) (Table 2). This association remained significant after accounting for multiple comparisons (FDR q=0.03). None of the five pathways examined were found to be associated (P<0.05) with risk of EA.

Table 2.

Assessment of pathway-level associations with risk of Barrett’s esophagus (BE) or esophageal adenocarcinoma (EA).

Pathway Genes Variantsa BE
EA
PCsb Pc qd PCsb Pc qd
COX 40 1241 40 0.006 0.03 40 0.20 0.60
Cytokines 184 2622 110 0.10 0.21 109 0.28 0.60
Oxidative stress 115 1958 73 0.13 0.21 73 0.58 0.60
Immune/HLA 32 1036 10 0.59 0.74 10 0.60 0.60
NFκB 125 1681 110 0.84 0.84 109 0.42 0.60
a

Total number of single nucleotide polymorphisms (SNPs) selected for analysis;

b

Pathway-level principal components (PCs) included in the logistic regression model;

c

Likelihood ratio P value;

d

False discovery rate (FDR) q value.

Gene-level associations with risk of BE

The observed association between variation in the COX pathway and risk of BE could reflect the summation of a large number of small, distributed signals across many genes, or represent relatively concentrated signals in a single or small number of genes. To evaluate these possibilities and determine whether or not individual COX pathway-related genes in particular might account for the identified association, we undertook gene-level analyses using the same PCA framework used at the pathway level. Of the 40 genes assigned to the COX pathway, we prioritized 10 for further analysis, based on their contribution to the overall pathway-level genotypic variance, as reflected in rank-ordered SNP loading coefficients in the first principal component. Among these 10 genes assessed for associations with risk of BE (Table 3), only a single gene exhibited a significant signal: microsomal glutathione S-transferase 1 (MGST1) (P=0.0005). This association remained significant after correction for multiple comparisons (FDR q=0.005). A non-significant (P=0.07) association was observed for gene-level variation in MGST1 and risk of EA (Table S2).

Table 3.

Assessment of first 10 gene-level associations with risk of BE.

Gene Variantsa PCsb Pc qd
1 MGST1 Microsomal glutathione S-transferase 1 36 3 0.0005 0.005
2 PTGER3 Prostaglandin E receptor 3 (subtype EP3) 185 4 0.11 0.51
3 PPARG Peroxisome proliferator-activated receptor gamma 121 3 0.15 0.51
4 TBXAS1 Thromboxane A synthase 1 (platelet) 176 5 0.29 0.58
5 IL12RB2 Interleukin 12 receptor, beta 2 29 3 0.29 0.58
6 CYP19A1 Cytochrome P450, family 19, subfamily A, polypeptide 1 50 3 0.40 0.66
7 MMP2 Matrix metallopeptidase 2 25 3 0.48 0.69
8 PPARA Peroxisome proliferator-activated receptor alpha 54 3 0.72 0.80
9 MGST2 Microsomal glutathione S-transferase 2 57 4 0.72 0.80
10 PTGES Prostaglandin E synthase 11 3 1.00 1.00
a

Total number of SNPs selected for analysis of the indicated gene;

b

Gene-level principal components (PCs) included in the logistic regression model;

c

Likelihood ratio P value;

d

False discovery rate (FDR) q value.

SNP-level associations with risk of BE

Individual SNPs located within or in proximity to (± 2.0 kb) the MGST1 locus were assessed for associations with risk of BE. Among 36 such variants examined, 23 exhibited a nominally significant (P<0.05) signal. 14 of these 23 remained significant after correction for (n=36) multiple comparisons (FDR q<0.05) (Table 4). The minor alleles at all 14 SNPs were associated with elevated risk of BE, with ORs ranging in magnitude from 1.10–1.38. The most significant association was for rs4149203 C>T (OR=1.16, P=9.0 × 10−5, q=0.001). A LocusZoom plot of the 36 assessed SNPs revealed a cluster of six nearby variants in high linkage disequilibrium (LD, r2>0.8) with rs4149203 (Figure 1). A second cluster of six SNPs satisfying FDR q<0.05 was situated at the 5′ end of the MGST1 locus (Figure S2); modest to moderate LD was observed between rs2239676, the top-ranked SNP in this region, and the other five variants in close proximity. Based on data from the NIH Roadmap Epigenome Project [42], three of these 5′ polymorphisms – rs2239676, rs2239677, and rs2975138 – lie within a 1.2-kilobase segment that spans the MGST1 transcriptional start site and is characterized by active chromatin marks in esophageal tissue (Figure S3). Among the 14 significant susceptibility signals identified for BE, 11 were also associated with increased risk of EA (P<0.05). Eight of these 11 remained significant after adjustment for multiple comparisons (q<0.05), with observed ORs ranging from 1.10 to 1.17 (Table S3).

Table 4.

Assessment of MGST1 SNPs (n=36) and risk of BE#

SNP Chr Position Allelesa Controls
BE cases
ORc 95% CI P qd
N MAFb N MAFb
1 rs4149203 12 16514921 T/C 3203 0.308 3288 0.346 1.16 (1.08–1.26) 0.0001 0.001
2 rs3852575 12 16516260 T/C 3203 0.304 3288 0.34 1.16 (1.08–1.25) 0.0001 0.001
3 rs7312090 12 16515945 T/C 3203 0.304 3288 0.34 1.16 (1.07–1.25) 0.0002 0.001
4 rs4149204 12 16515062 C/T 3203 0.307 3288 0.342 1.16 (1.07–1.25) 0.0002 0.001
5 rs4149207 12 16517491 T/C 3203 0.306 3288 0.338 1.14 (1.06–1.23) 0.0008 0.005
6 rs4149208 12 16517581 T/C 3203 0.306 3288 0.338 1.14 (1.06–1.23) 0.0008 0.005
7 rs3759207 12 16516710 C/T 3203 0.31 3288 0.34 1.14 (1.05–1.23) 0.0012 0.006
8 rs4149195 12 16512128 G/A 3203 0.109 3288 0.125 1.20 (1.07–1.35) 0.0013 0.006
9 rs2239676 12 16500448 G/C 3203 0.096 3288 0.113 1.19 (1.06–1.34) 0.0033 0.013
10 rs4149187 12 16500071 G/C 3203 0.098 3288 0.114 1.18 (1.05–1.32) 0.0061 0.022
11 rs2239677 12 16500680 A/G 3203 0.021 3288 0.027 1.38 (1.09–1.75) 0.0077 0.025
12 rs2239675 12 16500265 G/A 3203 0.172 3288 0.187 1.12 (1.02–1.23) 0.0172 0.049
13 rs4149186 12 16498700 C/A 3203 0.215 3288 0.235 1.11 (1.02–1.21) 0.0179 0.049
14 rs2975138 12 16501551 A/G 3203 0.237 3288 0.256 1.10 (1.02–1.20) 0.0192 0.049
#

Results for n=14 SNPs satisfying FDR q<0.05,

a

Minor/major alleles,

b

Minor allele frequency,

c

Odds ratio, adjusted for age, sex, PC1,AIM-PC4,AIM using additive model (per-allele),

d

False discovery rate (FDR)

Figure 1. Regional association plot for n=36 genotyped SNPs at the MGST1 gene locus.

Figure 1

The top-ranked SNP associated with risk of BE is shown in solid purple. SNPs are ordered by genomic location. The color scheme indicates LD between the top-ranked SNP and other SNPs in the region using r2 values calculated from the 1000 Genomes Project. The y axis shows −log10 (P) values computed from 3295 BE cases and 3204 controls. The recombination rate from CEU (Utah residents of Northern and Western European ancestry) HapMap data (right y axis) is shown in light blue.

Assessment of top MGST1 SNPs and risk of BE in an independent study sample

We next evaluated whether any of the 14 MGST1 variants associated with risk of BE in our primary analysis were similarly associated with altered BE risk in a large, independent sample set from the UK comprised of 1,851 BE patients and 3,496 control participants. 13 of the 14 SNPs were available for analysis, and of these, four variants exhibited borderline-significant (P<0.10) associations with BE: rs3852575, rs4149204, rs7312090, rs4149203 (Table 5). ORs for these SNPs were similar to those obtained in the primary analysis, though slightly reduced in magnitude (1.08 versus 1.16). In a subsequent meta-analysis, the P values for all four of these variants were highly significant (P<5.5 × 10−5), with an additional six SNPs satisfying P<0.05.

Table 5.

Assessment of MGST1 SNPs and risk of BE in an independent study sample of 1,851 BE cases and 3,496 control participants (Oxford)#.

SNP Allelea BEACON
Oxford
Meta-analysis
OR P OR P OR P
1 rs4149203 T 1.16 0.0001 1.08 0.0718 1.13 3.46E-05
2 rs3852575 T 1.16 0.0001 1.08 0.0661 1.13 4.04E-05
3 rs7312090 T 1.16 0.0002 1.08 0.0678 1.13 5.12E-05
4 rs4149204 C 1.16 0.0002 1.08 0.0668 1.13 5.25E-05
5 rs4149207 T 1.14 0.0008 1.05 0.2676 1.10 0.0011
6 rs4149208 T 1.14 0.0008 1.05 0.2837 1.10 0.0013
7 rs3759207 C 1.14 0.0012 1.05 0.2649 1.10 0.0015
8 rs4149195 G 1.20 0.0013 1.09 0.2160 1.15 0.0012
9 rs2239676 G 1.19 0.0033 0.99 0.9402 1.10 0.0293
10 rs4149187 G 1.18 0.0061 0.99 0.8894 1.09 0.0461
11 rs2239675 G 1.12 0.0172 1.00 0.9882 1.07 0.0704
12 rs4149186 C 1.11 0.0179 0.99 0.7973 1.05 0.1081
13 rs2975138 A 1.10 0.0192 1.01 0.8523 1.06 0.0605
#

Results for n=13 SNPs available for analysis among the 14 variants listed in Table 4;

a

Effect allele (all ORs represent per-allele risk estimates under an additive model)

Expression quantitative trait locus (eQTL) analysis

To explore whether or not any of the 14 individual MGST1 SNPs in Table 4 may also be correlated with altered MGST1 RNA expression levels in the esophagus, we conducted in silico eQTL analyses using the Genotype-Tissue Expression (GTEx) database [43]. Of the 13 SNPs with available genotyping and expression data in esophageal mucosa, two variants were associated (P<0.05) with differential MGST1 expression: rs4149186 A>C (P=7.9 × 10−5) and rs2975138 G>A (P=1.20 × 10−7) (Table S4). A third variant, rs4149203 C>T, reached borderline significance (P=0.074). For each of these SNPs, the allele found to be associated with increased risk of BE was also correlated with reduced expression of MGST1 (Figure S1).

Discussion

Chronic inflammation may occur as a result of multiple exposures established as risk factors for BE and EA (gastroesophageal reflux, obesity, smoking) and is thought to represent a common pathway underlying the emergence and progression of these conditions [3, 44]. This study represents the first systematic examination of the relationship between germline genetic variation in inflammation-related pathways—COX, cytokine signaling, oxidative stress, HLA, and NFκB—and risks of BE and EA, using a principal components-based analytic framework. Drawing on genetic data from a large consortium-based GWAS [24], we found a significant association between variation in the COX pathway and risk of BE, and identified a gene-level signal for MGST1. SNP-level analyses identified 14 individual MGST1 variants associated with elevated disease risk, including several intronic variants that were subsequently confirmed (P<5.5 × 10−5) in a meta-analysis encompassing a large independent sample set of additional BE cases and controls.

MGST1 is one of three microsomal glutathione S-transferase (GST) enzymes in humans, and belongs to a larger GST gene family encoding a number of proteins responsible for neutralizing oxidative stress through conjugation of endogenous and xenobiotic lipophilic electrophiles with glutathione [45, 46, 47]. MGST1 shares ~40% sequence homology at the amino acid level with prostaglandin E synthase (PTGES, formerly MGST1L1), a key enzyme that acts downstream of cyclooxygenases to catalyze the production of PGE2 from PGH2 [48]. MGST1–3 and PTGES belong to the “MAPEG” super-family of membrane-associated proteins in eicosanoid and glutathione metabolism. Microsomal GST1 is localized to the endoplasmic reticulum and outer mitochondrial membrane, and plays an important role in suppressing lipid peroxidation and protecting mitochondrial integrity [46]. Multiple alternatively spliced transcripts arise from the MGST1 gene locus, and the MGST1 promoter region has been shown to be transcriptionally responsive to oxidative stress [45]. Some evidence exists for an association between genetic variation in the MGST1 gene and altered risk of colorectal cancer in Han Chinese [49].

The 14 MGST1 SNPs found to be associated with risk of BE in our primary analysis were geographically clustered into two main groups, one at the 3′ end of the gene, and the other at the 5′ end, and may reflect two (or more) independent signals. The most significant association was for rs4149203 C>T, a 3′ intronic variant in strong LD with the six other associated 3′ SNPs (r2>0.8). Four of these seven SNPs, including rs4149203, were confirmed in the meta-analysis phase of our validation studies. These variants lie in a region defined by enhancer histone marks, and modify predicted sequence motifs for several transcription factors (eg. POU5F1, SOX, BRCA1, FOXP1; Table S5) [50]. Of interest, FOXP1 was identified as a susceptibility locus for EA in our previous report [24], and recently replicated in an independent study [28]. Published eQTL data from a study of gene expression in various brain regions indicated that rs4149203 (and correlated SNPs) may be associated with altered MGST1 expression in cerebellum [50, 51]. Proximity of several of these variants to an MGST1 splice junction also suggests a potential influence on (alternative) splicing regulation.

At the 5′ end of MGST1, rs2239676 C>G was the top signal identified among a cluster of six variants associated with BE risk. Three of these variants lie in close proximity to the MGST1 transcriptional start site, within a region characterized as active chromatin in esophageal tissue; recruitment of Pol II and several transcription factors (eg, Hey1, MYC/MAX) has been reported [42]. Our in silico eQTL analyses based on data from the GTEx project indicated that rs2975138 G>A and rs4149186 A>C, in particular, correlate with reduced MGST1 expression in esophageal mucosa. The rs2975138 variant modifies predicted motifs for estrogen receptor-alpha, Pax5, and Zfx, while rs4149186 alters recognition sequences for FoxA, FoxJ2, and Nkx2, among other regulators [50]. Given that these variants failed to validate in the Oxford (UK) dataset, however, their association with BE risk remains questionable. As a further qualification, we note that GTEx eQTL analyses appear to have been conducted using normal esophageal squamous epithelium, which based on emerging findings, may not in fact be the tissue of origin for Barrett’s epithelium [52].

The findings described above suggest that several of the identified variants may play a role in influencing MGST1 RNA expression levels. Additional studies, however, are warranted to investigate potential associations between selected variants and altered tissue-specific MGST1 expression, and to explore a possible causal basis for the observed findings. Since BE and EA often arise within an epithelium chronically exposed to refluxate and to cigarette-associated toxins (ie. associated with inflammation), it would be of interest to determine experimentally whether MGST1 plays a protective role in counteracting such insults and maintaining tissue homeostasis.

Given that BE is the only known precursor of EA, one expectation is that risk factors linked to altered risk of BE would be associated with similar alterations in risk of EA. In this study, variation in the COX pathway as a whole met the threshold for significance in relation to risk of BE, but not EA. Our subsequent analyses, however, revealed that a number of the individual MGST1 SNPs found to be associated with risk of BE did in fact exhibit similar associations with risk of EA (Table S3). With respect to top SNP-level signals, the associated ORs for EA were in the same direction as, and of comparable magnitude to, those observed for BE. This strong level of concordance suggests that the identified variants, if causal, may influence disease risk primarily at the level of BE, rather than progression from BE to EA.

Genes in our analysis assigned to the COX pathway included those coding for the two COX enzymes, prostaglandin and thromboxane synthases and receptors, aldo-keto reductases, peroxisome proliferator-activated receptors (PPAR), matrix metallopeptidases, microsomal glutathione S-transferases, and a small assortment of growth factors (VEGF) and interleukins or interleukin receptors. Previous candidate gene-based studies have reported associations between germline variation in PTGS2 (COX-2) and altered risk of EA [53, 54], while independent epidemiologic evidence has supported an inverse association between use of NSAIDs (inhibitors of COX-1 and COX-2 activity) and risk of EA [7, 8]. Our gene-level and SNP-level analyses did not include all genes assigned to the COX pathway (e.g. PTGS2), as only a limited subset were advanced for further study based on pre-specified selection criteria (the top 10 genes in PC1, see Table 3). It remains a possibility, therefore, that associations of disease risk with variation in other COX pathway genes may be evident in our dataset, and contribute in part to the observed pathway-level signal.

One of the main strengths of our study was the use of a PCA framework to assess pathway-level and gene-level associations between germline genetic variation and risk of BE or EA. PCA is an effective strategy to reduce data dimensionality. In this report, we adapted PCA to genetic pathway or gene analysis, and implemented a hierarchical strategy to identify genetic variants associated with traits. Application of PCA to GWAS data offered key advantages over conventional marginal analyses that are based exclusively on evaluation of individual SNPs. First, by aggregating signals across multiple genes (of a given pathway) or across multiple SNPs (of a given gene), the PCA method increased our ability to detect associations characterized by multiple, independent, distributed low-magnitude signals. Second, by reducing the dimensionality of the genotype matrix, PCA appreciably reduced the number of multiple comparisons and effectively increased our statistical power. Our tiered analysis plan further specified that only (a subset of) genes within significant pathways were assessed at the gene level, and only variants within significant genes were evaluated at the SNP level.

Another important strength was the use of pooled data from the BEACON GWAS, which provided the largest sample size to date in the evaluation of inflammation-related germline variation and risks of BE and EA. As a consequence of analyzing both BE and EA, we had the opportunity to compare genetic variation associated with risk of a neoplastic precursor lesion and the cancer that arises from it. Our assessment of 7,863 SNPs in 449 genes assigned to five pathways significantly expands past candidate gene-based efforts to examine genetic variation in inflammation-related loci in relation to risk of BE and EA.

This study also had certain limitations. First, while our tiered analysis scheme enabled us to restrict the number of comparisons and boost statistical power, it also narrowed the scope of our analysis and potentially resulted in missed association signals. Variation in four of the five included pathways was not examined at the gene or SNP level, while only 25% of the genes in the COX pathway were advanced beyond pathway-level assessment. Second, given the hierarchical nature of our statistical analysis, whereby we first assessed significance at the pathway level, and then proceeded to the gene level only for ‘significant’ pathways, the initial P values obtained for individual genes, and subsequently for individual SNPs, should be interpreted as the P values conditional on that pathway (or gene) already being selected, i.e., P(A|B), where B represents the event that a pathway (or gene) is selected, and A represents the event that a gene (or SNP) is significant. This conditional probability framework was well suited to our use of PCA as a discovery-phase approach for identifying potential novel association signals, which were then subsequently confirmed in an independent sample set. Third, while our study provided broad coverage of several major biological pathways of probable relevance to BE/EA, it is almost certain that a number of important genes or genomic loci were not included. Cytokine signaling, NFκB activity, and oxidative stress, for example, represent complex processes likely influenced by many hundreds or more gene products and a large number of intergenic loci harboring both enhancer/insulator transcriptional elements and non-coding RNAs. The present analysis, however, was largely restricted to examining common germline variants located within or in close (2.0-kb) proximity to defined protein-coding genes.

In conclusion, our study represents the most comprehensive evaluation to date of inflammation-related inherited genetic variation in relation to risk of BE and EA. Using a PCA framework for pathway-level and gene-level analyses, we describe evidence for novel associations between variation at the MGST1 locus and increased risk of BE. It appears possible that certain associated variants may act to influence expression levels of MGST1, a gene with known roles in the cellular response to oxidative stress. Pending further validation in additional study populations, future studies are warranted to fine-map the identified association signals, assess experimentally the functional effects of these variants, and explore the biological role of MGST1 in BE/EA pathogenesis.

Supplementary Material

SUPP

Acknowledgments

We thank Terri Watson, Tricia Christopherson, and Paul Hansen for their contributions in project management and organization of biospecimens/data. We thank Dr. Liam Murray (Queen’s University Belfast, UK) for access to GWAS data derived from subjects enrolled in his past studies of EA and BE (FINBAR Study). Genotyping data for MD Anderson control participants [42] was obtained from dbGaP through accession number phs000187.v1.p1. Genotyping data generated in the BEACON GWAS has been deposited into dbGaP under accession number phs000869.v1.p1.

Funding

This work was principally supported by the National Institutes of Health [R21DK099804 to M.M.M., and R01CA136725 to T.L.V. and D.C.W.]. Support for studies related to the Oxford dataset was granted by the Esophageal Adenocarcinoma GenE Consortia incorporating the ChOPIN project (grant C548/A5675) and Inherited Predisposition of neoplasia analysis of genomic DNA (IPOD) from AspECT and BOSS clinical trials project (grant MGAG1G7R); Cancer Research UK (AspECT, grants C548/A4584 and D9612L00090, and Histological AssessmeNt Determining EpitheliaL Response (HANDEL), grant C548/A9085); AstraZeneca UK educational grant; University Hospitals of Leicester R and D grant; and AspECT (T91 5211 University of Oxford grant HDRMJQ0). Additional funding sources for individual studies included in the BEACON GWAS, and for BEACON investigators, have been acknowledged previously [24].

Footnotes

The authors disclose no potential conflicts of interest.

Author contributions

Conception and design: M.M.M., Q.H., M.F.B., L.G.J., T.L.V. Participant recruitment: J.L., R.C.F., W.Y., C.C., N.C.B., N.J.S., L.B., M.D.G., A.H.W., L.J.H., P.D.P., G.L., P.I., D.A.C., H.A.R., W-H.C., I.T., J.J., D.C.W., T.L.V. Analysis and interpretation of data: M.F.B, Q.H., M.M.M., L.G.J., L.O., D.M.L., A.P.T., T.L.V. Drafting of the manuscript: M.F.B., M.M.M., Q.H., T.L.V. Study supervision: M.M.M. All authors critically revised the manuscript for intellectual content.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SUPP

RESOURCES