Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2016 Aug 2;99(2):352–365. doi: 10.1016/j.ajhg.2016.06.018

Constrained Score Statistics Identify Genetic Variants Interacting with Multiple Risk Factors in Barrett’s Esophagus

James Y Dai 1,2,, Jean de Dieu Tapsoba 1, Matthew F Buas 1; the BEACON Consortium, Harvey A Risch 3, Thomas L Vaughan 1,4
PMCID: PMC4974090  PMID: 27486777

Abstract

Few gene-environment interactions (G × E) have been discovered in cancer epidemiology thus far, in part due to the large number of possible G × E to be investigated and inherent low statistical power of traditional analytic methods for discovering G × E. We consider simultaneously testing for interactions between several related exposures and a genetic variant in a genome-wide study. To improve power, constrained testing strategies are proposed for multivariate gene-environment interactions at two levels: interactions that have the same direction (one-sided or bidirectional hypotheses) or are proportional to respective exposure main effects (a variant of Tukey’s one-degree test). Score statistics were developed to expedite the genome-wide computation. We conducted extensive simulations to evaluate validity and power performance of the proposed statistics, applied them to the genetic and environmental exposure data for esophageal adenocarcinoma and Barrett’s esophagus from the Barretts Esophagus and Esophageal Adenocarcinoma Consortium (BEACON), and discovered three loci simultaneously interacting with gastresophageal reflux, obesity, and tobacco smoking with genome-wide significance. These findings deepen understanding of the genetic and environmental architecture of Barrett’s esophagus and esophageal adenocarcinoma.

Introduction

With the exception of rare familial cancers, cancer risk is usually determined by a complex interplay between multiple genetic and environmental factors.1 Alterations in somatic genome can result from exposure to external agents (e.g., tobacco smoking) and infectious agents (e.g., Helicobacter pylori) and from the multifaceted effects of certain host characteristics (e.g., obesity), all generally defined to be “environmental exposures” in this report. Individuals differ in their inherited efficiency to neutralize environmental insults and repair genomic damages. Efforts have long been focused on discovering susceptibility genes involved in carcinogenesis and on characterizing how these genes interact with environmental factors.2 Agnostic searches for gene-environment interactions (G × E) have been routinely conducted in genome-wide studies,3 but with very limited success.4, 5 Much discussion has been devoted to methodological issues,6, 7 such as design and estimation,8, 9, 10, 11 power and sample size,12 and mathematical formulation of interactions.13

When multiple environmental factors have been established for a cancer and are available for G × E testing, the standard analytical approach is to interrogate one pair of genotype and exposure at a time and subsequently correct for multiple testing.14, 15 This strategy is well interpretable, but not optimal when environmental risk factors are related in cancer etiology, so that genotypes involved in the same etiological pathway may interact with all related exposures. This article develops constrained inferential methods for testing multiple G × E simultaneously, leveraging known biological knowledge on cancer etiology. The methods are motivated by a genome-wide G × E study examining esophageal adenocarcinoma and Barrett’s esophagus (BE/EA [MIM: 614266]) from the Barretts Esophagus and Esophageal Adenocarcinoma Consortium (BEACON), an international consortium composed of studies in Australia, Europe, and North America, where the incidence of esophageal adenocarcinoma has been rising sharply in the past four decades.16, 17, 18, 19 Esophageal adenocarcinoma cases are believed to arise from Barrett’s esophagus, a precursor lesion,20 and both conditions share risk factors including gastresophageal reflux symptoms (GERD [MIM: 109350]), obesity, and tobacco smoking.21, 22 The results of genome-wide genetic association and a limited search for G × E among known susceptibility loci in BEACON have been reported elsewhere.23, 24

For esophageal adenocarcinoma and its precursor lesion (Barrett’s esophagus), all three well-established risk factors—obesity, gastresophageal reflux, and tobacco smoking—have been linked to local and systemic inflammation and the downstream consequence of oxidative stress that promotes DNA damage and chromosomal instability.20 It is plausible that SNPs in inflammation pathways may modulate induced defense mechanisms against damage caused by these exposures, thereby interacting with all three exposures in a similar fashion. Furthermore, obesity is associated with gastresophageal reflux in the white population.25 Adjustments of p values from testing one exposure-genotype interaction at a time via Bonferroni correction can be overly conservative. In contrast, simultaneously testing for G × E among multiple environmental factors potentially pools small associations and increases power.

To improve power, we consider multivariate G × E testing strategies in which the directions of multiple G × E interactions under the alternative hypothesis are constrained. Suppose θ1, …, θp are interaction parameters between a genetic variant and p established environmental factors in a logistic regression model for a cancer case-control study. The null hypothesis that there is no G × E for any of environmental factors is represented by

H0:θj=0forj=1,,p. (Equation 1)

Classical multivariate tests such as Hotelling’s T2 test statistics are designed to test any departure from the null hypothesis; they therefore lack power to detect specific types of departure that may be considered plausible based on a priori scientific knowledge, e.g., the inflammation pathway in the etiology of esophageal adenocarcinoma. Restricting alternative hypothesis to be one-sided, θj < 0 for some j, for example SNPs in inflammatory pathways may be associated with a lower odds ratio for any of abdominal obesity, gastresophageal reflux, and cigarette smoking, will presumably boost the power. The constraint can also be motivated for the other direction: θj > 0. In the form of likelihood ratio tests, this kind of constraint has been investigated for clinical trials with multiple study endpoints.26, 27 However, the critical values for computing p values under the asymptotic null distribution of these one-sided statistics are generally difficult to obtain, and maximizing the likelihood under the one-sided constraint can be cumbersome for genome-wide testing.

In the same vein, a more restrictive, but parsimonious, model for interactions was proposed based on Tukey’s 1-df test:28, 29, 30 gene-gene or gene-environment interactions are proportional to main effects by a fixed constant. In essence, risk associations of genetic variants or environmental exposures are first summed together by a genetic risk score or an environmental risk score, assuming that each group of risk variables share common underlying etiological pathways. More specifically, this test can be motivated by a latent-variable model, where each group of measured markers serves as a surrogate of an underlying biological phenotype.30 A single interaction parameter then summarizes the deviation from the additive contributions of two risk predispositions, possibly in a transformed risk scale.30 Built on this parsimonious model, it has been proposed to jointly test for both genetic associations and gene-gene or gene-environment interactions.30 A complication of this model is that, when there is no genetic association, as is true for the great majority of variants being examined in genome-wide studies, the interaction parameter vanishes and is not identifiable. The usual hypothesis testing statistics are no longer applicable and specialized procedures have to be developed with simulated null distributions.

In this article, we develop score statistics with aforementioned constraints for multiple G × E in a genome-wide study, because score statistics are computationally attractive as regression parameters are estimated only under the null and, further, they have the same local power as Wald and likelihood ratio statistics. We extend the classical one-sided test to a bidirectional test, in which multiple G × E can be either negative or positive, so that one does not have to specify the direction one at a time. For both one-sided test and bidirectional test, we propose score statistics that are easy to compute and have an exact or easily obtainable asymptotic distribution. For the proportional G×E testing, we modify Tukey’s 1-df model to allow for the interaction not dependent on the genetic main association, and we derive finite-sample correction for the mean and the variance of the score statistic. Such correction, though often ignorable at the usual significance level for one test (type I error rate 0.05), is critical to control genome-wide type I errors, as we show in simulations and the BEACON study. We evaluate the performance of the proposed methods in extensive simulations and apply them in the genome-wide G × E study from BEACON.

Material and Methods

Consider a genetic association study with n participants. Let Y denote the disease status, for example cancer patients versus control subjects. Let X denote the vector of p known risk-associated exposures, and let G denote one of many SNPs under investigation, possibly in a genome-wide study. Typically, the study also assembles covariates including age, gender, and top principal components of genetic variation for controlling population stratification, collectively denoted by W. The interest hereafter is to discover whether G modifies risk associations of some exposures in X. The data therefore consist of n independent and identically distributed random vectors (Yi, Xi, Gi, Wi), for i = 1, …, n.

Consider a regression model investigating p statistical interactions simultaneously,

g{E(Y|G,X,W)}=β0+β1G+j=1pβ2jXj+β3W+j=1pθjXjG, (Equation 2)

where β1 is the genetic association when all exposures are 0, β2j is the association of the jth exposure when the genotype score is 0, θj is the interaction between G and Xj, and g is the logit function if Y is a dichotomous outcome or the identical function if Y is a quantitative trait modeled by linear regression.

The null hypothesis for multiple G × E in Equation 2 is expressed in Equation 1, stating that none of p exposure-risk associations changes with the genotype G. Classical multivariate tests are designed to be unrestricted in the alternative hypothesis

H1a:θj0foratleastonej.

Denote the parameters of interest θ=(θ1,,θp) and the nuisance parameters β=(β0,β1,β21,,β2p,β3). Let α=(β,θ), f(y;α) denote the density of Y given covariates, and let

(α)=i=1nlogf(Yi;α),S(α)=(/α)(α),andI(α)=Eα{(/α)(α)(/αT)(α)}

denote the log-likelihood, the score function, and the information matrix, respectively. S(α) is partitioned into (Sβ,Sθ) to conform with the partition (β,θ) of α. Similarly, I(α) is partitioned into block matrices Iββ,Iβθ,Iθβ,Iθθ. Let α˜ denote the maximum likelihood estimator under the null hypothesis H0. Classical Rao’s score statistic for testing H1a is

Ta=S˜θT(I˜θθI˜θβI˜ββ1I˜βθ)1S˜θ,

where S˜ and I˜ are evaluated at α˜, respectively. When the likelihood is correctly specified, Ta has a χp2 distribution in large samples under the null hypothesis.

Testing for One-Sided or Bidirectional Hypothesis

As we elaborated in the Introduction, it is sometimes plausible in cancer epidemiology to test for one-sided multiple G × E simultaneously, for example

H1b:θj0forj=1,,p

with strict inequality for at least one j. The interest here is specifically in discovering genetic alleles that simultaneously link to a lower association for each of p exposures. The one-sided test can be motivated similarly for θj ≥ 0, aiming to discover genetic alleles that link to a higher association for exposures. In a complete agnostic search such as genome-wide testing for G × E, it is perhaps more sensible to formulate the following bidirectional hypothesis

H1c:θj0forj=1,,porθj0forj=1,,p

with strict inequality for at least one j. The multiple G × E are constrained to be directional, either all positive or all negative. This bidirectional test has not been seen in the statistical literature. In what follows, we derive score tests for H1b and H1c based on an approximated likelihood ratio test.27

Let β˜ denote the maximum likelihood estimator of β under H0, and let α˜=(β˜,0), where 0 is a length-p zero vector for θ. Let a sequence of local alternative hypotheses be defined by

H1n:θ=n1/2δ,

where δ is a fixed point in Rp. By employing Taylor series expansions, we have

n1/2Sθ(α˜)dN(Iθθ,β(α0)δ,Iθθ,β(α0))

and

n1/2Iθθ,β1(α˜)Sθ(α˜)dN(δ,Iθθ,β1(α0)),

under H1n, where α0=(β,0), Iθθ,β=IθθIθβIββ1Iβθ. Let U=n1/2Iθθ,β1(α˜)Sθ(α˜). Since θ ≤ 0 is equivalent to δ ≤ 0, one may define a score test as follows:

Ts=n[UTIθθ,βUinf{(Uδ)TIθθ,β(Uδ):δ0}].

This is essentially the likelihood ratio statistic for δ ≤ 0 based on a single realization of U:31 the first term is the log-likelihood of U under the null hypothesis δ = 0, and the second term is the maximum log-likelihood of U under the alternative hypothesis δ ≤ 0. If the information matrix Iθθ,β is known, the asymptotic null distribution of Ts is a chi-bar-squared distribution that is equivalent to a weighted sum of a sequence of chi-square distributions, where the weights can be computed by algebra in closed forms when p ≤ 431, 32 and by simulations when p > 4. Alternatively, if Iθθ,β is diagonal, close-form expressions of the weights are readily available for any p. In our case, however, Iθθ,β is unknown, depending on β that has to be estimated. Furthermore, it is unlikely to be a diagonal matrix in the context of testing G × E. Consequently the asymptotic null distribution of Ts is difficult to obtain.

We therefore develop an approximate score statistic for one-sided hypotheses, in the same spirit as the approximate likelihood ratio statistic for a normal mean vector with nonnegative components.27 The main advantage is that, after transformation and approximation, the asymptotic null distribution of the score statistic does not rely on the nuisance parameters involved in Iθθ,β, and therefore the significance level is much easier to compute, an appealing feature for genome-wide testing.

Let A be a p × p matrix such that

ATA=Iθθ,β, (Equation 3)

and so z = AU is asymptotically a normal vector with the identity covariance matrix. Such A is not unique, as we discuss the choice of A in the next paragraph. For a chosen A, the score statistic Ts becomes

n[zTzinf{(zAδ)T(zAδ):δ0}].

Let b = Aδ. Geometrically, the parameter space under the alternative is the negative orthant. To compute Ts, one has to minimize (zb)T(zb) for all vectors in the image space A(δ)={Aδ|δ0}, which can be cumbersome. Instead, the proposed test approximates the polyhedral space A(δ) by the negative orthant {b|b0}, with A chosen to make the center directions of the two spaces coincide. For the jth element in z, denoted by zj, if zj ≥ 0, inf{(zjbj)2:bj0}=zj2; if zj < 0, inf{(zjbj)2:bj0}=0. Because zj and zj are independent, it follows that the approximate score statistic for one-sided hypothesis H1b is

Tb=j=1p(zj0)2,

where zj0 denote the minimum of zj and 0. Because of the independence among zj values, the asymptotic null distribution is a special case of the chi-bar-squared distribution,27 expressed as j=0p[{p!/j!(pj)!}/2p]χj2, where χj2 is the chi-square distribution with j degrees of freedom and χ02 is defined as the constant zero.

The degree of approximation depends on the information matrix Iθθ,β. If it is diagonal, A can be chosen such that A(δ) is exactly the negative orthant. For an arbitrary information matrix, we can enhance the approximation by choosing the transformation A so that the center direction of A(δ), which is defined to be the direction that forms the same acute angle with any of the edge directions, coincides with the center direction of the negative orthant. For the negative orthant in the two-dimensional space, for instance, the center direction is (−1,−1). For a given A, the p edges of the polyhedral cone are given by

fj=Aej/(ejTIθθ,βej)1/2,

with j = 1,2, …, p and ej is the unit vector for the jth coordinate. The center direction is then given by

dA={(f1,,fp)T}1J,

where J=(1,,1)T, the center direction of the negative orthant. Therefore the center direction of A is required to be

dA=cJ, (Equation 4)

for some c > 0. In Appendix A, we give the details of how to construct A to ensure A(δ) has the same center direction as the negative orthant.

Following the same derivation, the score statistic for the other one-sided test, θj ≥ 0 for j = 1, …, p can be constructed as j=1p(zj0)2, where zj0 denote the maximum of zj and 0. The score statistic for the bidirectional hypothesis H1c is therefore the maximum of the two test statistics,

Tc=max(j=1p(zj0)2,j=1p(zj0)2).

The test statistic now depends on the angle of zj in orthants other than positive and negative ones, and so the asymptotic null distribution is no longer a chi-bar-squared distribution. Since under the null hypothesis zj is a vector of independent and standard normal distributions, the null distribution of Tc can be generated by simulation. For genome-wide testing, however, an enormous number of simulations need to be generated to achieve the accuracy needed for genome-wide significance (5 × 10−8). Though this requires the generation of a large number of simulated statistics (in the order of ∼109–1010) only once, it is rather cumbersome for genome-wide testing. We propose a simple hybrid procedure that computes p values with adequate accuracy: first, a p value is computed by simulations to the accuracy of the third decimal place; if the simulation-based p value is less than 0.01, then we further compute an approximate, conservative p value by algebraic calculation. The detail of the algebraic approximation is shown in Appendix A. In Figures S1 and S2, we show some numerical analyses comparing simulation-based p values and algebraic approximations. This algebraic approximation works remarkably well: the smaller p value, the better approximation.

Testing for Proportional Interactions

Relative to the one-sided test, a more restrictive alternative hypothesis for multiple G × E in Equation 2 entails

H1d:θj=γβ2jforsomeγ0andj=1,,p,

so that in the log-odds scale, the interactions are proportional to the exposure main associations. One way to interpret this hypothesis is as follows: because of the shared etiology, the combinatory effect of environmental risk factors can be formulated as an environmental risk score, similar to genetic risk scores commonly used.33 The genetic variant modifies the disease association of this environmental risk score as a whole. This test is slightly different from Tukey’s 1-df test because the genetic main association, β1, does not factor in the interactions.28 For genome-wide association studies, β1 and γ will be 0 for the majority of loci. Given that the exposures are all established risk factors, i.e., β2j0, the interaction parameter γ is identifiable even if there is no genetic main effect under the null hypothesis β1=0, avoiding the troublesome identifiability issue.30

The parameters are reduced to the vector α=(β,γ). The standard Rao score statistic for testing γ=0, however, shows severe inflated type I errors in the BEACON genome-wide association study (see Figure S3, the quantile-quantile plot). The problem originates in the unique score function derived from the proportional interaction model. The usual expressions for the mean and variance of S˜γ need to be corrected for the finite-sample bias, as we derive below.

Let Xi=(1,Gi,Xi,Wi) be the vector of covariates, and let β˜ denote the maximum likelihood estimator for β under the null hypothesis. The score function for β and γ under the null hypothesis can be written as

S˜β=iXi(Yiexp(Xiβ˜)1+exp(Xiβ˜)),S˜γ=ijβ˜2jGiXij(Yiexp(Xiβ˜)1+exp(Xiβ˜)).

Let H˜γβ=S˜γ/β. Asymptotic distribution theories for maximum likelihood estimator β˜ lead to n(β˜β)=nI˜ββ1S˜β+op(1), and so the usual first-order Taylor expansion of (1/n)S˜γ at β yields

1nS˜γ=1nSγ+1nH˜γβ(β˜β)+op(1)=1nSγ+1nH˜γβI˜ββ1S˜β+op(1)

The classical derivation for the large-sample distribution of a score statistic for a composite null hypothesis usually suffice. In this case, however, the first-order Taylor expansion turns out to be inadequate for a genome-wide significance level at 5 × 10−8. More importantly, H˜γβ for this proportional interaction model is a function of the outcome variable Y,

H˜γβ=i{Xi(jβ˜2jGiXij)μi(1μi)+Vi(Yiμi)},

where μi=exp(Xiβ˜)/{1+exp(Xiβ˜)} and Vi=(0,0,GiXi,0). This induces correlation between H˜γβ and S˜β which, if ignored, will result in bias when calculating the mean and the variance of score statistics in finite samples, and consequently inflated type I errors. For example, the expectation of S˜γ is no longer 0 in finite samples. We show in Figure S1 a heavily inflated quantile-quantile plot for the standard score test for the proportional interactions. In Appendix A, we show the algebraic development of the finite-sample correction for the mean (denoted by τd) and the variance (denoted by Σd) of S˜γ after a second-order Taylor expansion,

1nS˜γ=1nSγ+1nH˜γβ(β˜β)+12n(β˜β)TH˜γββ(β˜β)+op(1n),

where H˜γββ is the second derivative of S˜γ with respect to β. The corrected score statistic for testing γ = 0 is therefore expressed as

Td=(S˜γτd)Σd1(S˜γτd).

The first-order correction is simpler but turns out to be inadequate to correct for the inflated type I error (Figure S4). In our simulation study and application to the BEACON data shown next, the second-order correction expressed as Td shows proper control of the type I error in genome-wide significance.

Results

Simulation

We conducted numerous simulations to evaluate the performance of the proposed score statistics. Specifically, a genotype (coded as 0, 1, 2) was generated with minor allele frequency 0.1. Three exposure variables were generated by a multivariate normal distribution with zero means and a covariance matrix (either a diagonal matrix with diagonal elements 1 or a compound symmetric matrix with diagonal elements 1 and off-diagonal elements 0.6). The second and third exposure variables were dichotomized at zero. A logistic model was used to generate a dichotomous disease outcome conditional on the genotype and the exposures. The logarithm of the odds ratio was set to −0.5 for the genotype and (0.7, 0.7, 0.7) for the three exposures. The intercept was set to −4 to generate a disease probability of approximately 5%, under the null hypothesis that none of the exposures interacts with the genotype.

The two correlation structures between exposures generate different levels of correlation in the score functions for the three interaction parameters: a moderate level of correlations for each pair of score functions (<0.5) or a high correlation for each pair of score functions (>0.5). The correlation would affect the performance of the one-side score test and bidirectional test, because it is based on an approximation of the correlation-transformed orthant.27

To evaluate the type I error rate, a cohort was first generated by these probabilistic distributions with a ladder of sample sizes (104, 5 × 104, 105). All case subjects were sampled for simulated genotyping, together with the same number of randomly sampled control subjects. Table 1 shows empirical type I error rates evaluated in 5 × 105 datasets simulated under the null hypothesis, when the nominal p value cut-off was set to be 0.05, 0.001, or 0.0001. The number of case subjects varies from 500 to 5,000, with the same number of control subjects. The upper part shows the scenario where the three environmental exposures are independent and the lower part shows the scenario where they are correlated. In all scenarios, the Bonferroni correction for three separate tests and the classical score test for the unconstrained alternative H1a shows adequate performance in controlling type I errors. The one-sided test for H1b shows proper control of type I error rates except slight inflation for significance level 0.001 and 0.0001 under 500 case subjects. The increased correlation hurts the performance of the one-sided score statistic. The bidirectional score statistic for H1c shows superior control of the type I error rate, because it uses a conservative approximation in the tail of the p value distribution. Under a relative smaller sample size, the naive score statistic for H1d displays grossly inflated type I error rates when nominal significance is 0.001 or 0.0001, 4 times or 14 times more than the correct level when the exposures are independent. This inflation dissipates when sample size increases, suggesting that the problem is largely driven by erroneous small-sample properties, not the large-sample behavior. In comparison, the corrected score statistic for H1d shows adequate control of type I error rates for all settings. In genome-wide testing for gene-environment interactions, the power is typically small for the genome-wide significance level (5 × 10−8), and so the inflation of the type I error rate for the score statistic for H1d can be concerning. See, for example, the quantile-quantile plot for the BEACON data in Figure S3. As we show in the data analysis later, the correction we proposed for H1d is necessary to preserve adequate control of type I error.

Table 1.

The Ratio of the Observed Type I Error Rate versus the Nominal Value for the Bonferroni Test, the Unconstrained Score Test, and the Constrained Score Tests

Proportional
n Cases/n Controls p Value Level Bonferroni Unconstrained One-Sided Bidirectional Standard Corrected
Independent Exposures

500/500 0.05 0.99 1.01 1.1 1.02 1.17 0.88
0.001 0.97 1.01 1.57 1.07 4.4 1.05
0.0001 0.88 1.14 1.98 1.32 14.24 1.5
2,500/2,500 0.05 0.98 1.00 1.03 1 1.04 0.98
0.001 0.94 1.05 1.25 1.07 1.5 1.04
0.0001 0.98 1.04 1.34 1.10 2.68 1.40
5,000/5,000 0.05 0.98 0.99 1.02 0.99 1.01 0.99
0.001 0.96 1 1.14 0.97 1.25 1
0.0001 0.76 1.02 1.18 1.02 1.56 1

Correlated Exposures

500/500 0.05 0.94 0.99 1.09 0.99 1.10 0.91
0.001 1.03 1.01 1.65 1.1 3.08 1.18
0.0001 1.34 1.06 2.12 1.34 9.62 2.06
2,500/2,500 0.05 0.98 1.01 1.04 1.01 1.03 0.99
0.001 1.04 0.99 1.38 1.06 1.48 1.1
0.0001 0.92 0.80 1.54 0.90 1.92 0.98
5,000/5,000 0.05 0.97 0.99 1.01 0.99 1.01 0.99
0.001 0.99 1.02 1.21 0.94 1.17 0.99
0.0001 1.12 0.94 1.34 1.16 1.38 1.04

In Figures 1 and 2 we show the power performance for the five methods, in which the Bonferroni correction and the 3-df score test are the benchmarks of the comparison. We consider two scenarios: 500 case and 500 control subjects, significance level 0.05 (Figure 1), and 2,500 case and 2,500 control subjects, significance level 0.0001 (Figure 2). In each scenario, we generate four different sets of interaction parameters: proportional, one-sided (negative), two-sided, and only one non-zero interaction. The latter two sets are designed to test the robustness of the constrained methods when respective modeling assumptions are violated. We show the settings where the exposures are independent and omit the settings with correlated exposures, since the relative comparison between methods did not differ substantially from Figures 1 and 2.

Figure 1.

Figure 1

Statistical Power of the Four Methods Being Evaluated in Four Scenarios with a Small Sample Size

500 case subjects and 500 control subjects; there is no correlation between environmental exposures.

(A) The three interactions are proportional to respective main effects.

(B) The three interactions are not proportional, but one-sided.

(C) The three interactions have different directions.

(D) Two of the three interactions are zero.

Figure 2.

Figure 2

Statistical Power of the Four Methods Being Evaluated in Four Scenarios with a Large Sample Size

2,500 case subjects and 2,500 control subject; there is no correlation between environmental exposures.

(A) The three interactions are proportional to respective main effects.

(B) The three interactions are not proportional, but one-sided.

(C) The three interactions have different directions.

(D) Two of the three interactions are zero.

Figure 1 shows power performance of various methods for 500 case subjects and 500 control subjects with the significance level 0.05. In Figure 1A, the interaction parameters were set to be truly proportional to the respective main effects, namely γβ2j with γ increasing from 0 to 1. Clearly, the Bonferroni method performs the worst among the five methods evaluated, losing 5%–10% power when compared to the unconstrained 3-df score test. The one-sided (negative) score statistic for H1b shows the best performance in power, consistently higher over the unconstrained score test, with the maximal power gain reaching 15%–20%. The proportional score statistic with finite-sample correction shows the second best performance, slightly better than the bidirectional test. The advantage of the one-sided test over the proportional and the bidirectional test is expected, because the latter two examine differences in both positive and negative directions.

Figure 1B shows the setting in which the interactions are truly one-sided but not proportional to the main exposure effects, specifically, θj = γ(−0.75, −0.5, −0.25) with γ increasing from 0 to 1 and the main effects are (0.7, 0.7, 0.7). The order of the performance remains the same as that in Figure 1A, though the difference is smaller. The one-sided score test performs the best in this scenario, whereas the proportional interaction test and the bidirectional test appear to be next in power performance. The Bonferroni method yields nearly identical power as the unconstrained score test.

Figure 1C shows a scenario where the interactions do not have the same sign, nor are proportional to the main exposure effects. Specifically, θj = γ(−0.5, −0.75, −0.25) with δ increasing from 0 to 1. This is the setting where the modeling assumptions for the three constrained score tests are violated. Consequently, the unconstrained test and the Bonferroni correction perform better than the constrained tests. The one-side (negative) test and the bidirectional test show some degree of robustness, with similarly lower power performance. The proportional test delivers a substantially inferior power performance relative to the other four tests.

In Figure 1D, we simulate a scenario where two out of the three interactions are 0, θj = γ(0, −1.0, 0) with δ increasing from 0 to 1. In this scenario, the Bonferroni method outperforms the unconstrained score test; the bidirectional test is similar to the constrained test; the proportional interaction test performs very poorly because this alternative hypothesis is not at all in the direction for which it is designated. Interestingly, the one-sided test still delivers the best power performance, because these interaction parameter values still conform to the alternative hypothesis H1b that the interactions lie in the negative orthant.

Figure 2 shows the settings with 2,500 case and 2,500 control subjects, p value cut-off 0.0001. We want to examine the performance of the proposed methods in a large-scale case-control study, like the one in BEACON, and there are a large number of genotypes. The similar pattern as Figure 1 was observed in power comparison: when the alternative hypothesis is directional or proportional, the three constrained score statistics performs better than the standard 3-df test and the Bonferroni correction; notably, the proportional interaction outperforms the one-sided test, suggesting that pooling three exposures into a risk score for interaction really boosts the power when the significance level is far out in the tail; the one-sided test and the bidirectional test can be quite robust against model violation, whereas the performance of the proportional test drops substantially when the alternative hypothesis are not in the same direction. We also note that another feature affecting the performance of the proportional test is the effect size of the main exposure effects (Figure S5). Strong exposure effects and the large sample size, such as the three exposures in the BEACON study, works best for the proportional test, because its performance depends on the variability of the main effect estimates. Based on these simulations, it is compelling to apply the three constrained score tests and the unconstrained score test in the BEACON study.

Data Application

All 1,516 esophageal adenocarcinoma case individuals and 2,416 Barrett’s esophagus individuals in the discovery phase of the BEACON GWAS were included in this investigation, together with 2,187 control participants. A description of the genome-wide study has been published previously.23 All recruited participants gave informed consent, and this study was approved by the ethics boards of each participating institution. Genetic data for which the authors have IRB permission to make public are available from the dbGaP database (accession numbers phs000869.v1.p1 and phs000187.v1.p1). Three established risk factors were investigated for potential interaction with 922,031 SNPs that were genotyped by the Illumina HumanOmni1-Quad platform and passed quality control: cigarette smoking, body-mass index (BMI), and gastresophageal reflux symptoms. These variables were coded as ever smoking (yes or no), BMI (<25, ≥25 to <30, ≥30 kg/m2), and at least weekly heartburn or weekly reflux (yes or no). A logistic model was fitted for every SNP and the three exposures, separately for EA and BE. Each model included age, sex, the first four principal components derived from genome-wide SNP data to account for population stratification, the SNP main effect (coded as two indicator variables), the three exposure main effects (all coded categorically), and the product of the SNP value (0, 1, or 2, treated quantitatively) and each of the three exposures. The multivariate SNP-exposure interactions were tested by the unconstrained score test, the one-sided test, and the proportional test.

Figure 3 shows the q-q plots for the four genome-wide G × E testing methods for Barrett’s esophagus. The one-sided results for the positive interaction for Barrett’s esophagus, as well as all G × E tests for esophageal adenocarcinoma, did not yield any significant results and therefore were not presented. Figure 3A shows that the unconstrained 3-df score statistics yield two genome-wide significant SNPs on chromosome 7 that are highly correlated (r2 = 1; rs11765529, p value 3.48 × 10−9; rs7798662, p value 6.06 × 10−9). The same two SNPs remain genome-wide significant when tested by one-sided (negative) score statistics, the bidirectional test, and the proportional interaction test shown in Figures 3B–3D. The proportional interaction test resulted in one additional genome-wide significant SNP, rs4930068 in chromosome 11 (p value 3.51 × 10−8, Figure 3D). In Figure S3, we show the q-q plot with seriously inflated type I error rates when the finite-sample correction is not applied to score statistics for proportional interactions.

Figure 3.

Figure 3

The q-q Plots for Genome-wide Testing for G × E on Barrett’s Esophagus by the Three Score Statistics

(A) Unconstrained score test.

(B) One-sided score test for negative G × E.

(C) Bidirectional test.

(D) Score test for proportional G × E.

Table 2 shows the estimated odds ratios of the three environmental factors stratified by the number of alleles for the two genome-wide significant SNPs. Barrett’s esophagus and esophageal adenocarcinoma were investigated as cases separately in logistic regression models with three SNP-environment interactions. The interpretation of these ORs is conditional on genotype, age, gender, and the other two exposures being fixed at certain levels. The right half of Table 2 shows the p values from various score tests for G × E testing. For almost every combination of SNP, exposure, and disease combination, the OR decreases with the number of increasing-risk alleles. This decreasing OR with the number of alleles is most strongly seen in GERD, the strongest risk factor, and the trend we found for Barrett’s esophagus is similar to that for esophageal adenocarcinoma, though the latter does not have a genome-wide significance level. Since both the cases of Barrett’s esophagus and the cases of esophageal adenocarcinoma were compared to the same control group, the findings for esophageal adenocarcinoma are not independent replications. Nonetheless, the consistency between Barrett’s esophagus and esophageal adenocarcinoma adds to the evidence for true G × E in the etiological pathway.

Table 2.

The Odds Ratio of the Three Environmental Factors Associated with SNP Genotypes for Barrett’s Esophagus and Esophageal Adenocarcinoma, and the Three Score Tests for Multiple G × E Simultaneously

OR Stratified by the Number of Minor Alleles
G × E p Value
Disease Risk Factor 0 1 2 Unconstrained One-Sided Bidirectional Proportional
rs11765529

BE BMI 1.94 (1.74, 2.17) 1.64 (1.24, 2.16) 1.28 (0.08, 19.7)
smoking 1.67 (1.41, 1.98) 0.79 (0.51, 1.21) 0.94 (0.80, 11.1)
GERD 6.39 (5.35, 7.64) 2.09 (1.36, 3.21) 0.69 (0.06, 8.09) 3.48 × 10−9 7.10 × 10−10 1.42 × 10−9 1.81 × 10−9
EA BMI 1.51 (1.31, 1.74) 1.17 (0.81, 1.69)
smoking 1.87 (1.50, 2.35) 1.20 (0.64, 2.27)
GERD 3.83 (3.07, 4.77) 2.34 (1.33, 4.11) 0.013 0.004 0.0072 0.003

rs4930068

BE BMI 2.11 (1.80, 2.49) 1.91 (1.64, 2.22) 1.32 (1.01, 1.72)
smoking 1.56 (1.22, 1.99) 1.73 (1.38, 2.18) 0.85 (0.56, 1.28)
GERD 7.91 (6.08, 10.30) 4.75 (3.76, 6.02) 2.86 (1.88, 4.35) 1.53 × 10−6 3.38 × 10−7 6.76 × 10−7 3.51 × 10−8
EA BMI 1.46 (1.19, 1.79) 1.43 (1.17, 1.74) 1.41 (0.99, 1.99)
smoking 2.14 (1.54, 2.99) 1.67 (1.23, 2.27) 1.08 (0.60, 1.95)
GERD 4.80 (3.48, 6.60) 3.04 (2.26, 4.08) 2.21 (1.24, 3.92) 0.007 0.002 0.004 0.004

Abbreviations are as follows: BE, Barrett’s esophagus; EA, esophageal adenocarcinoma.

Association analyses were further conducted for imputed SNPs based on the 1000 Genomes Project around genotyped SNPs that showed significant interactions with one of the three risk factors. The details of the imputation procedure have been presented previously.23 Figure 4 shows the regional interaction associations around rs11765529 and rs4930068, including both genotyped and imputed SNPs within a 1 Mb interval centering around the index SNPs. The index SNPs are marked as purple squares, other genotyped SNPs are marked with squares, and imputed SNPs are marked by circles. The rs11765529 variant is located in a gene-poor intergenic region, ∼160–180 kb from the nearest protein-coding gene, POM121L12 (transmembrane nucleoporin-like 12 [MIM: 615753]). The rs4930068 variant at 11p15.5 is located 5.4 kb upstream (5′) of the transcriptional start site for ASCL2 (MIM: 601886). None of adjacent SNPs showed more significant p values than the index SNPs.

Figure 4.

Figure 4

Regional Association Plot of Genotyped and Imputed SNPs in Proximity to Two Newly Discovered SNPs with Genome-wide Significance

The top portion of the figure has physical position along the x axis, and −log10(p value) for the interaction term on the y axis. The index SNP is marked as a purple square, other genotyped SNPs are marked with squares, and imputed SNPs are marked by circles. The color scheme represents the pairwise correlation (r2) between a given SNP and the index SNP. The bottom portion of the figure shows the position of genes across the region.

(A) Genotyped and imputed SNPs around rs11765529, the p values are from the unconstrained score test.

(B) Genotyped and imputed SNPs around rs4930068, the p values are from the score test for proportional interactions.

We conducted in silico functional characterization of these SNPs in several bioinformatics databases. The two variants at 7p12.1 (rs7798662 and rs11765529) are situated in a gene-poor intergenic region, ∼160–180 kb from the nearest protein-coding gene, POM121L12 (Figure S6A). These two SNPs are separated by ∼24 kb, are in strong linkage disequilibrium (LD) (r2 = 1), and are highly correlated (r2 = 1) with >100 additional variants in a 40-kb region, based on data from the 1000 Genomes Project34 (Table S1A). rs7798662 modifies a predicted binding motif for the transcription factor Smad3 (MIM: 603109), whereas rs11765529 alters motifs for HDAC2 (MIM: 605164).35 Several correlated SNPs lie in regions marked by enhancer histone modifications or DNase hypersensitivity in multiple tissues or cell lines, based on data from the NIH Roadmap Epigenome Project and ENCODE.36, 37 Recruitment of the transcription factor JUND (MIM: 165162) has been reported ∼6.5 kb away from rs11765529. Of potential interest, rs7798662 and rs11765529 lie within a 3.3 Mb region designated as a long-range epigenetic activation (LREA) domain in prostate cancer cells; this region, which encompasses a total of ten genes (including POM121L12) was identified as transcriptionally active in prostate cancer cell lines relative to normal primary prostate cells.38 The POM121L12 protein is largely uncharacterized with respect to biological function, but is related to the POM121 transmembrane nucleoporin, a key component of the nuclear pore complex, which enables transport into and out of the nucleus.39, 40

The rs4930068 variant at 11p15.5 is located 5.4 kb upstream (5′) of the transcriptional start site for ASCL2, which encodes the Achaete-Scute Family bHLH Transcription Factor 2 (Figure S6B). rs4930068 is in strong LD (r2 > 0.8) with 41 additional variants located within 20 kb, including 30 SNPs within 8 kb (r2 > 0.90).34 rs4930068 lies in a 7,400-bp region characterized as heterochromatin in esophageal tissue, according to chromatin state segmentation data derived from the Roadmap Epigenome Project.37 Multiple SNPs strongly correlated with rs4930068, rs11021967, rs7396048, rs2285567, rs11022026, and rs6578259 are located within segments marked by enhancer histone marks in the esophagus and/or distal gastrointestinal (GI) mucosa and appear to hold high regulatory potential (Table S1). Several of these variants lie within regions characterized by DNase hypersensitivity and transcription factor recruitment (TAL1 [MIM: 187040], USF1 [MIM: 191523], E2F6 [MIM: 602944], ELF1 [MIM: 189973], MAX [MIM: 154950]) in various cells/tissues and modify predicted regulatory binding motifs.36 Some of these SNPs (rs7396048, rs2285567, rs11022026) are potential expression quantitative trait loci (eQTLs) in whole blood or other tissues.36

The ASCL2 protein is a member of the basic helix-loop-helix (HLH) family of transcription factors, which dimerize and bind to DNA via their HLH and basic domains, respectively.40 bHLH proteins play important general roles in cell fate specification and lineage-specific transcriptional control. ASCL1 (MIM: 100790) and ASCL2 are essential factors in the development of the neuroectoderm and trophectoderm, respectively,41, 42 and both proteins are frequently expressed in human cancers. ASCL2, an apparent target gene of the Wnt signaling pathway, was shown to be upregulated in colorectal neoplasia43 and has been linked to an intestinal stem cell expression signature that defines a majority of colorectal cancers (MIM: 114500)44 and may be present in precancerous adenomas.45 Low-level expression of ASCL2 mRNA has also recently been reported in BE46 while studies in human esophageal cell lines have described induction of ASCL2 mRNA and protein in a sub-population of stem-like spheroid cells derived in culture.47 Of particular interest, ASCL2 was shown to suppress expression of the homeobox transcription factor CDX2 (MIM: 600297),48 a known early marker of BE,49 subsequently downregulated during EA pathogenesis.50

Based on our in silico functional characterization of rs4930068 and correlated variants, a potential regulatory role for these SNPs in influencing ASCL2 expression levels appears plausible. However, future laboratory-based studies are clearly required to assess possible functional effects. Recent evidence has suggested that Barrett’s esophagus may represent the outgrowth of a residual embryonic epithelial cell population, rather than squamous-to-columnar trans-differentiation of resident stratified epithelium.51, 52 In this context, it is intriguing to consider the possibility that the observed interaction between rs4930068 and GERD in relation to risk of BE might in part reflect alterations in the expression level of ASCL2, a transcription factor with known roles in both development and cancer.

Discussion

In summary, we have proposed constrained testing strategies for identifying interactions between a genotype and multiple related environmental exposures: the one-sided interactions, the bidirectional interactions, and the proportional interactions (on the log OR scale relative to respective main environmental associations). Several statistical contributions were made in this work. First, an approximate score statistic was developed for the one-sided test, which circumvents the difficulty of obtaining an exact asymptotic distribution, and further extended to the bidirectional test. Second, for the proportional interaction test, we proposed a second-order correction to the mean and the variance of the score statistic, which resolved the issue of the inflated type I error rate in genome-wide testing. Our proposed methods were applied to the BEACON study and identified several loci that interact with three environmental exposures in Barrett’s esophagus with genome-wide significance.

Our work is exploratory in nature, as other methodological developments in genome-wide association testing. There is a paucity of established G × E, much due to lower power of such exploration. The goal of our work is to enrich G × E findings that may lead to further validation and functional studies. These G × E could help in risk prediction and stratification, or possibly individualized prevention for modifiable exposures.

Our simulation study suggests that the one-side test nearly always outperforms, if only slightly, the proportional interaction test, even in the scenario where interactions are generated to be proportional. This is primarily due to the proportional test being a two-sided test, and its increased variability because of its dependence on the main associations of environmental exposures. When the latter were estimated with quite some uncertainty, for example, when the sample size is small, the proportional test is not very appealing in power performance. However, in consortium studies where investigators can assemble a large sample size, the proportional test can be advantageous when the interactions are truly directional (Figures 2A and 2B). When such constraint is violated, the proportional test can be powerless as shown in Figures 1C, 1D, 2C, and 2D. The directional test strikes the balance between power and robustness, performing competitively under true model constraints and displaying robustness against model violations.

In cancer epidemiology, if several exposures share some etiological pathways, our simulation shows that testing one exposure at a time is inferior in power to simultaneously testing multiple G × E altogether. We show an example of applying such testing strategies in the BEACON study and we found several interesting genome-wide significant hits. We note that the suitability of the proposed constrained inference depends heavily on study context and a priori scientific knowledge. Other plausible settings include multiple measures of the same underlying exposure profile, such as measures of smoking and diet intake. Investigators should exercise caution when applying the constrained inference since, as we show in the simulation, violations of model constraints can result in deteriorated power.

Beyond the multivariate G × E testing we consider, constrained inference strategies for a single G × E test have been explored. For example, it is often reasonable to constrain the interaction to be quantitative, because epidemiologists believe such interaction is generally unnatural and rare, if it exists at all.53 Some works have been developed along this line, possibly incorporating the powerful but controversial gene-environment independence assumption.54, 55 From a methodological point of view, the latter assumption could be incorporated into the constrained score statistics we developed in this article, with some additional work. For the BEACON study, however, this assumption does not hold because BMI and GERD are both phenotypes with substantial genetic susceptibility.56, 57

We close with a brief discussion on generalizing the proposed methods. The Tukey’s test has been formulated to study interactions between multiple genes and multiple environmental exposures. In principle, the one-sided and bidirectional testing strategies can be applied equally to one exposure of interest and a set of SNPs in the gene, or a set of known risk loci. The proportional interaction test naturally applies to this setting. One complexity for the one-sided test is that, in contrast to the BEACON study where all three exposures are known to be hazardous, it may be entirely unknown which alleles of the SNPs in a gene may modify the exposure association in the same direction. Therefore it requires a bit more understanding of the function of the alleles prior to motivating the directional hypotheses for one exposure of interest and a set of SNPs in the gene. The other potential issue for one exposure and a set of SNPs is that the number of SNPs has to be limited, because the dimensionality would affect the degree of freedom in the chi-bar-squared distribution, which will in turn influence power performance.

Consortia

The following BEACON consortium members contributed to this paper: Wong-Ho Chow, Nicholas J. Shaheen, Lesley Anderson, Douglas A. Corley, Marilie D. Gammon, Laura J. Hardie, Jesper Lagergren, and David C. Whiteman. Full addresses are listed in Supplemental Data.

Acknowledgments

This work was supported in part by NIH grants R01 HL114901, R01 CA136725, R21 CA197502, and P01 CA53996. We thank Nilanjan Chatterjee for helpful comments.

Published: August 4, 2016

Footnotes

Supplemental Data include six figures, one table, and BEACON consortium membership and can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2016.06.018.

Contributor Information

James Y. Dai, Email: jdai@fredhutch.org.

the BEACON Consortium:

Wong-Ho Chow, Nicholas J. Shaheen, Lesley Anderson, Douglas A. Corley, Marilie D. Gammon, Laura J. Hardie, Jesper Lagergren, and David C. Whiteman

Appendix A

Construction of the Transformation Matrix A

As stipulated in Material and Methods, ATA=Iθθ,β and its center direction should coincide with the center direction of the negative orthant. In addition, it would be appealing that A(δ) is invariant to the order of the elements in U, since the order of the interaction terms tested in the alternative hypothesis should not change the test statistic. We adopt a linear ordering method previously developed to re-arrange U and Iθθ,β.58 Next we find A that satisfies Equations 3 and 4. Let H = (h1,…, hp) be the orthogonal matrix obtained by applying the Gram-Schmidt orthonormalization to the independent set of vectors (J, e1,…, ep). Let CTC=Iθθ,β be the Choleski decomposition of Iθθ,β. Let dc denote the center direction of the image space C(δ)={Cδ|δ0}, and let V = (v1,…, vp) be the orthogonal matrix obtained by applying Gram-Schmidt orthonormalization to the independent set of vectors (dc, e2,…, ek). Then A = HVTC satisfies the conditions of Equations 3 and 4.

Computation of Tail p Values for the Bidirectional Hypothesis by Algebraic Approximation

It is instructive to show the approximation when p = 2. The calculation in a higher dimension follows without difficulty. Specifically, let (z1, z2) denote the standard bivariate normal variables. The four quadrants are denoted by Q1, Q2, Q3, and Q4, respectively. The p value for the observed Tc = t is the sum of components from each quadrant. The components of p value in Q1 and Q3 follow a standard χ22 distribution. For components in Q2 and Q4,

Pr(Tc>t|(z1,z2)Q2Q4)=Pr(z22>t,|z2||z1||(z1,z2)Q2Q4)+Pr(z12>t,|z1|>|z2||(z1,z2)Q2Q4)<Pr(z22>t|(z1,z2)Q2Q4)+Pr(z12>t|(z1,z2)Q2Q4)=Pr(χ12>t)+Pr(χ12>t)

It is clear that the bigger t, the better approximation. In Figures S1 and S2, we show a comparison of simulation- based p values and approximation-based p values. For p values < 0.01, the difference is essentially negligible.

Correcting the Mean and the Variance of the Score Statistic for the Proportional Interaction

For dichotomous outcomes, the exact computation of the mean of the second-order Taylor expansion of S˜γ, namely

Sγ+H˜γβI˜ββ1S˜β+12S˜βTI˜ββ1H˜γββI˜ββ1S˜β,

can be derived algebraically, since

τd=E(Sγ+H˜γβI˜ββ1S˜β+12S˜βTI˜ββ1H˜γββI˜ββ1S˜β)=E[(iK˜γβ,i)I˜ββ1(iS˜β,i)+12(iS˜β,i)TI˜ββ1H˜γββI˜ββ1(iS˜β,i)]=iE(K˜γβ,iI˜ββ1S˜β,i+12S˜β,iTI˜ββ1H˜γββI˜ββ1S˜β,i)=iμi(K˜γβ,i1I˜ββS˜β,i1+12S˜β,i1TI˜ββ1H˜γββI˜ββ1S˜β,i1)+(1μi)(K˜γβ,i0I˜ββS˜β,i0+12S˜β,i0TI˜ββ1H˜γββI˜ββ1S˜β,i0),

where K˜γβ,i=Vi(Yiμi) and S˜β,i are respective individual contribution from the ith subject, K˜γβ,i1 and S˜β,i1 are contributions from the ith subject, evaluated at Yi = 1, and K˜γβ,i0 and S˜β,i0 are contributions from the ith subject, evaluated at Yi = 0. The second-to-last equality uses the independence between the ith subject and the jth subject, if ij. The last equality uses the unique feature that Yi takes two values (1,0) with probability μi and (1μi), respectively. Similar, but more tedious, calculation applies to

Σd=Var(Sγ+H˜γβI˜ββ1S˜β+12S˜βTI˜ββ1H˜γββI˜ββ1S˜β).

Let Δ=iXi(jβ˜2jGiXij)μi(1μi). The key terms involved in the calculation are expressed below:

E(Sγ+H˜γβI˜ββ1S˜β)2=iE(Sγ,i+(Δ+K˜γβ,i)I˜ββ1S˜β,i)2+ijE(K˜γβ,iI˜ββ1S˜β,j)2
E(S˜βTI˜ββ1H˜γββI˜ββ1S˜β)2=ijE(S˜β,iTI˜ββ1H˜γββI˜ββ1S˜β,i)2
E(SγS˜βTI˜ββ1H˜γββI˜ββ1S˜β)=i(Sγ,iS˜β,iTI˜ββ1H˜γββI˜ββ1S˜β,i)
E(SγS˜βTI˜ββ1H˜γββI˜ββ1S˜β)=i(Sγ,iS˜β,iTI˜ββ1H˜γββI˜ββ1S˜β,i)
E((H˜γβI˜ββ1S˜β)(S˜βTI˜ββ1H˜γββI˜ββ1S˜β))=iE((ΔI˜ββ1S˜β,i)(S˜β,iTI˜ββ1H˜γββI˜ββ1S˜β,i))+ijE((K˜γβ,iI˜ββ1S˜β,j)(S˜β,iTI˜ββ1H˜γββI˜ββ1S˜β,j)).

Web Resources

Supplemental Data

Document S1. Figures S1–S6, Table S1, and BEACON Consortium Membership
mmc1.pdf (1.5MB, pdf)
Document S2. Article plus Supplemental Data
mmc2.pdf (2.8MB, pdf)

References

  • 1.Brennan P. Gene-environment interaction and aetiology of cancer: what does it mean and how can we measure it? Carcinogenesis. 2002;23:381–387. doi: 10.1093/carcin/23.3.381. [DOI] [PubMed] [Google Scholar]
  • 2.Hunter D.J. Gene-environment interactions in human diseases. Nat. Rev. Genet. 2005;6:287–298. doi: 10.1038/nrg1578. [DOI] [PubMed] [Google Scholar]
  • 3.Thomas D. Gene--environment-wide association studies: emerging approaches. Nat. Rev. Genet. 2010;11:259–272. doi: 10.1038/nrg2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wu C., Kraft P., Zhai K., Chang J., Wang Z., Li Y., Hu Z., He Z., Jia W., Abnet C.C. Genome-wide association analyses of esophageal squamous cell carcinoma in Chinese identify multiple susceptibility loci and gene-environment interactions. Nat. Genet. 2012;44:1090–1097. doi: 10.1038/ng.2411. [DOI] [PubMed] [Google Scholar]
  • 5.Garcia-Closas M., Rothman N., Figueroa J.D., Prokunina-Olsson L., Han S.S., Baris D., Jacobs E.J., Malats N., De Vivo I., Albanes D. Common genetic polymorphisms modify the effect of smoking on absolute risk of bladder cancer. Cancer Res. 2013;73:2211–2220. doi: 10.1158/0008-5472.CAN-12-2388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Prentice R.L. Empirical evaluation of gene and environment interactions: methods and potential. J. Natl. Cancer Inst. 2011;103:1209–1210. doi: 10.1093/jnci/djr279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hutter C.M., Mechanic L.E., Chatterjee N., Kraft P., Gillanders E.M., NCI Gene-Environment Think Tank Gene-environment interactions in cancer epidemiology: a National Cancer Institute Think Tank report. Genet. Epidemiol. 2013;37:643–657. doi: 10.1002/gepi.21756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Chatterjee N., Carroll R.J. Semiparametric maximum likelihood estimation exploiting gene-environment indepdendence in case-control studies. Biometrika. 2005;92:399–418. [Google Scholar]
  • 9.Mukherjee B., Chatterjee N. Exploiting gene-environment independence for analysis of case-control studies: an empirical Bayes-type shrinkage estimator to trade off between bias and efficiency. Biometrics. 2008;64:685–694. doi: 10.1111/j.1541-0420.2007.00953.x. [DOI] [PubMed] [Google Scholar]
  • 10.Dai J.Y., Kooperberg C., Leblanc M., Prentice R.L. Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction. Biometrika. 2012;99:929–944. doi: 10.1093/biomet/ass044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hsu L., Jiao S., Dai J.Y., Hutter C., Peters U., Kooperberg C. Powerful cocktail methods for detecting genome-wide gene-environment interaction. Genet. Epidemiol. 2012;36:183–194. doi: 10.1002/gepi.21610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kooperberg C., Dai J.Y., Hsu L. MIT Press; Cambridge: 2015. Methodological and Statistical Issues in Gene-Environment and Gene-Gene Interactions for Complex Phenotypes. [Google Scholar]
  • 13.VanderWeele T.J. Sufficient cause interactions and statistical interactions. Epidemiology. 2009;20:6–13. doi: 10.1097/EDE.0b013e31818f69e7. [DOI] [PubMed] [Google Scholar]
  • 14.Campa D., Kaaks R., Le Marchand L., Haiman C.A., Travis R.C., Berg C.D., Buring J.E., Chanock S.J., Diver W.R., Dostal L. Interactions between genetic variants and breast cancer risk factors in the breast and prostate cancer cohort consortium. J. Natl. Cancer Inst. 2011;103:1252–1263. doi: 10.1093/jnci/djr265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hutter C.M., Chang-Claude J., Slattery M.L., Pflugeisen B.M., Lin Y., Duggan D., Nan H., Lemire M., Rangrej J., Figueiredo J.C. Characterization of gene-environment interactions for colorectal cancer susceptibility loci. Cancer Res. 2012;72:2036–2044. doi: 10.1158/0008-5472.CAN-11-4067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Blot W.J., Devesa S.S., Kneller R.W., Fraumeni J.F.J., Jr. Rising incidence of adenocarcinoma of the esophagus and gastric cardia. JAMA. 1991;265:1287–1289. [PubMed] [Google Scholar]
  • 17.Brown L.M., Devesa S.S., Chow W.H. Incidence of adenocarcinoma of the esophagus among white Americans by sex, stage, and age. J. Natl. Cancer Inst. 2008;100:1184–1187. doi: 10.1093/jnci/djn211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Thrift A.P., Whiteman D.C. The incidence of esophageal adenocarcinoma continues to rise: analysis of period and birth cohort effects on recent trends. Ann. Oncol. 2012;23:3155–3162. doi: 10.1093/annonc/mds181. [DOI] [PubMed] [Google Scholar]
  • 19.Vaughan T.L., Fitzgerald R.C. Precision prevention of oesophageal adenocarcinoma. Nat. Rev. Gastroenterol. Hepatol. 2015;12:243–248. doi: 10.1038/nrgastro.2015.24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Reid B.J., Li X., Galipeau P.C., Vaughan T.L. Barrett’s oesophagus and oesophageal adenocarcinoma: time for a new synthesis. Nat. Rev. Cancer. 2010;10:87–101. doi: 10.1038/nrc2773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Engel L.S., Chow W.H., Vaughan T.L., Gammon M.D., Risch H.A., Stanford J.L., Schoenberg J.B., Mayne S.T., Dubrow R., Rotterdam H. Population attributable risks of esophageal and gastric cancers. J. Natl. Cancer Inst. 2003;95:1404–1413. doi: 10.1093/jnci/djg047. [DOI] [PubMed] [Google Scholar]
  • 22.Olsen C.M., Pandeya N., Green A.C., Webb P.M., Whiteman D.C., Australian Cancer Study Population attributable fractions of adenocarcinoma of the esophagus and gastroesophageal junction. Am. J. Epidemiol. 2011;174:582–590. doi: 10.1093/aje/kwr117. [DOI] [PubMed] [Google Scholar]
  • 23.Levine D.M., Ek W.E., Zhang R., Liu X., Onstad L., Sather C., Lao-Sirieix P., Gammon M.D., Corley D.A., Shaheen N.J. A genome-wide association study identifies new susceptibility loci for esophageal adenocarcinoma and Barrett’s esophagus. Nat. Genet. 2013;45:1487–1493. doi: 10.1038/ng.2796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Dai J.Y., de Dieu Tapsoba J., Buas M.F., Onstad L.E., Levine D.M., Risch H.A., Chow W.H., Bernstein L., Ye W., Lagergren J. A newly identified susceptibility locus near FOXP1 modifies the association of gastroesophageal reflux with Barrett’s esophagus. Cancer Epidemiol. Biomarkers Prev. 2015;24:1739–1747. doi: 10.1158/1055-9965.EPI-15-0507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Corley D.A., Kubo A., Zhao W. Abdominal obesity, ethnicity and gastro-oesophageal reflux symptoms. Gut. 2007;56:756–762. doi: 10.1136/gut.2006.109413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Pocock S.J., Geller N.L., Tsiatis A.A. The analysis of multiple endpoints in clinical trials. Biometrics. 1987;43:487–498. [PubMed] [Google Scholar]
  • 27.Tang D.I., Gnecco C., Geller N.L. An approximate likelihood ratio test for a normal mean vector with nonnegative components with application to clinical trials. Biometrika. 1989;76:577–583. [Google Scholar]
  • 28.Tukey J.W. One degree of freedom for non-additivity. Biometrics. 1949;5:232–242. [Google Scholar]
  • 29.Mandel J. Non-additivity in two-way analysis of variance. J. Am. Stat. Assoc. 1961;56:878–888. [Google Scholar]
  • 30.Chatterjee N., Kalaylioglu Z., Moslehi R., Peters U., Wacholder S. Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. Am. J. Hum. Genet. 2006;79:1002–1016. doi: 10.1086/509704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kudo A. A multivariate analogue of the one-sided test. Biometrika. 1963;50:403–418. [Google Scholar]
  • 32.Shapiro A. Asymptotic distribution of test statistics in the analysis of moment structures under inequality constraints. Biometrika. 1985;72:133–144. [Google Scholar]
  • 33.Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 2013;9:e1003348. doi: 10.1371/journal.pgen.1003348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Abecasis G.R., Auton A., Brooks L.D., DePristo M.A., Durbin R.M., Handsaker R.E., Kang H.M., Marth G.T., McVean G.A., 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Ward L.D., Kellis M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 2012;40:D930–D934. doi: 10.1093/nar/gkr917. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Bernstein B.E., Stamatoyannopoulos J.A., Costello J.F., Ren B., Milosavljevic A., Meissner A., Kellis M., Marra M.A., Beaudet A.L., Ecker J.R. The nih roadmap epigenomics mapping consortium. Nat. Biotechnol. 2010;28:1045–1048. doi: 10.1038/nbt1010-1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Dunham I., Kundaje A., Aldred S.F., Collins P.J., Davis C., Doyle F., ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bert S.A., Robinson M.D., Strbenac D., Statham A.L., Song J.Z., Hulf T., Sutherland R.L., Coolen M.W., Stirzaker C., Clark S.J. Regional activation of the cancer genome by long-range epigenetic remodeling. Cancer Cell. 2013;23:9–22. doi: 10.1016/j.ccr.2012.11.006. [DOI] [PubMed] [Google Scholar]
  • 39.Hallberg E., Wozniak R.W., Blobel G. An integral membrane protein of the pore membrane domain of the nuclear envelope contains a nucleoporin-like region. J. Cell Biol. 1993;122:513–521. doi: 10.1083/jcb.122.3.513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Jones S. An overview of the basic helix-loop-helix proteins. Genome Biol. 2004;5:226. doi: 10.1186/gb-2004-5-6-226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Guillemot F., Lo L.C., Johnson J.E., Auerbach A., Anderson D.J., Joyner A.L. Mammalian achaete-scute homolog 1 is required for the early development of olfactory and autonomic neurons. Cell. 1993;75:463–476. doi: 10.1016/0092-8674(93)90381-y. [DOI] [PubMed] [Google Scholar]
  • 42.Guillemot F., Nagy A., Auerbach A., Rossant J., Joyner A.L. Essential role of Mash-2 in extraembryonic development. Nature. 1994;371:333–336. doi: 10.1038/371333a0. [DOI] [PubMed] [Google Scholar]
  • 43.Jubb A.M., Chalasani S., Frantz G.D., Smits R., Grabsch H.I., Kavi V., Maughan N.J., Hillan K.J., Quirke P., Koeppen H. Achaete-scute like 2 (ascl2) is a target of Wnt signalling and is upregulated in intestinal neoplasia. Oncogene. 2006;25:3445–3457. doi: 10.1038/sj.onc.1209382. [DOI] [PubMed] [Google Scholar]
  • 44.Ziskin J.L., Dunlap D., Yaylaoglu M., Fodor I.K., Forrest W.F., Patel R., Ge N., Hutchins G.G., Pine J.K., Quirke P. In situ validation of an intestinal stem cell signature in colorectal cancer. Gut. 2013;62:1012–1023. doi: 10.1136/gutjnl-2011-301195. [DOI] [PubMed] [Google Scholar]
  • 45.Jang B.G., Kim H.S., Kim K.J., Rhee Y.Y., Kim W.H., Kang G.H. Distribution of intestinal stem cell markers in colorectal precancerous lesions. Histopathology. 2016;68:567–577. doi: 10.1111/his.12787. [DOI] [PubMed] [Google Scholar]
  • 46.Jang B.G., Lee B.L., Kim W.H. Intestinal stem cell markers in the intestinal metaplasia of stomach and Barrett’s esophagus. PLoS ONE. 2015;10:e0127300. doi: 10.1371/journal.pone.0127300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Zhao R., Quaroni L., Casson A.G. Identification and characterization of stemlike cells in human esophageal adenocarcinoma and normal epithelial cell lines. J. Thorac. Cardiovasc. Surg. 2012;144:1192–1199. doi: 10.1016/j.jtcvs.2012.08.008. [DOI] [PubMed] [Google Scholar]
  • 48.Shang Y., Pan Q., Chen L., Ye J., Zhong X., Li X., Meng L., Guo J., Tian Y., He Y. Achaete scute-like 2 suppresses CDX2 expression and inhibits intestinal neoplastic epithelial cell differentiation. Oncotarget. 2015;6:30993–31006. doi: 10.18632/oncotarget.5206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Moons L.M.G., Bax D.A., Kuipers E.J., Van Dekken H., Haringsma J., Van Vliet A.H., Siersema P.D., Kusters J.G. The homeodomain protein CDX2 is an early marker of Barrett’s oesophagus. J. Clin. Pathol. 2004;57:1063–1068. doi: 10.1136/jcp.2003.015727. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Hayes S., Ahmed S., Clark P. Immunohistochemical assessment for Cdx2 expression in the Barrett metaplasia-dysplasia-adenocarcinoma sequence. J. Clin. Pathol. 2011;64:110–113. doi: 10.1136/jcp.2010.075945. [DOI] [PubMed] [Google Scholar]
  • 51.Wang X., Ouyang H., Yamamoto Y., Kumar P.A., Wei T.S., Dagher R., Vincent M., Lu X., Bellizzi A.M., Ho K.Y. Residual embryonic cells as precursors of a Barrett’s-like metaplasia. Cell. 2011;145:1023–1035. doi: 10.1016/j.cell.2011.05.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Xian W., Ho K.Y., Crum C.P., McKeon F. Cellular origin of Barrett’s esophagus: controversy and therapeutic implications. Gastroenterology. 2012;142:1424–1430. doi: 10.1053/j.gastro.2012.04.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Weiss N.S. Subgroup-specific associations in the face of overall null results: should we rush in or fear to tread? Cancer Epidemiol. Biomarkers Prev. 2008;17:1297–1299. doi: 10.1158/1055-9965.EPI-08-0144. [DOI] [PubMed] [Google Scholar]
  • 54.Han S.S., Resenberg P.S., Chatterjee N. Testing for gene-environment and gene-gene interaction under monotonity constraints. J. Am. Stat. Assoc. 2012;107:1441–1452. [Google Scholar]
  • 55.Song M., Nicolae D.L. Restricted parameter space models for testing gene-gene interaction. Genet. Epidemiol. 2009;33:386–393. doi: 10.1002/gepi.20392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Locke A.E., Kahali B., Berndt S.I., Justice A.E., Pers T.H., Day F.R., Powell C., Vedantam S., Buchkovich M.L., Yang J., LifeLines Cohort Study. ADIPOGen Consortium. AGEN-BMI Working Group. CARDIOGRAMplusC4D Consortium. CKDGen Consortium. GLGC. ICBP. MAGIC Investigators. MuTHER Consortium. MIGen Consortium. PAGE Consortium. ReproGen Consortium. GENIE Consortium. International Endogene Consortium Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. doi: 10.1038/nature14177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Gharahkhani P., Tung J., Hinds D., Mishra A., Vaughan T.L., Whiteman D.C., MacGregor S., Barrett’s and Esophageal Adenocarcinoma Consortium (BEACON) BEACON study investigators Chronic gastroesophageal reflux disease shares genetic background with esophageal adenocarcinoma and Barrett’s esophagus. Hum. Mol. Genet. 2016;25:828–835. doi: 10.1093/hmg/ddv512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Tang D.I., Geller N.L., Pocock S.J. On the design and analysis of randomized clinical trials with multiple endpoints. Biometrics. 1993;49:23–30. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S6, Table S1, and BEACON Consortium Membership
mmc1.pdf (1.5MB, pdf)
Document S2. Article plus Supplemental Data
mmc2.pdf (2.8MB, pdf)

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES