Abstract
We develop a locally efficient test for (multiplicative) gene–environment interaction in family studies that collect genotypic information and environmental exposures for affected offspring along with genotypic information for their parents or relatives. The proposed test does not require modeling the effects of environmental exposures and is doubly robust in the sense of being valid if either a model for the main genetic effect holds or a model for the expected environmental exposure (given the offspring affection status and parental mating types) but not necessarily both. It extends the FBAT-I to allow for missing parental mating types and families of arbitrary size. Simulation studies and the analysis of an Alzheimer's disease study confirm the adequate performance of the proposed test.
Keywords: Causal inference, Double robustness, Effect modification, Gene–environment interaction, Genetic association, Nuclear family, Semiparametric interaction model
1. INTRODUCTION
Family studies in which genotypic and phenotypic information is collected on individuals along with genotypic information on their relatives (most commonly parents and/or siblings) are of interest when assessing genetic associations because they allow adjustment for (possibly unmeasured) confounding due to population admixture (Spielman and others, 1993). Such confounding may, for instance, occur when the distribution of alleles at the candidate locus varies in each subpopulation of the study and the trait of interest is correlated with the population substructure. The ability to adjust for (usually unmeasured) confounding is key to the success of the transmission disequilibrium test (TDT) (Spielman and others, 1993), the family-based association test (FBAT) (Laird and others, 2000; Lange and Laird, 2002), and variations thereof.
The primary focus in the literature on family-based genetic association studies has been on testing the null hypothesis of no genetic association and/or no linkage, with notable exceptions (Chatterjee and others, 2005; Cordell and others, 2004; Dudbridge, 2008). Tests for gene–environment interaction must additionally acknowledge the possible presence of a genetic main effect. Such an effect is difficult to estimate (a) because families are often sampled conditional on the trait (i.e. families may be ascertained) and (b) because of the possibility of unmeasured confounding due to population admixture/substructures. In this article, we develop inference for the genetic main effect under a relative risk (RR) model for affection status and based on family data for affected offspring. This then leads to a test for gene–environment interaction, which extends the QBAT-I test of Vansteelandt, DeMeo, and others (2008) and the more general interaction tests of Vansteelandt, VanderWeele, and others (2008) to allow for ascertainment conditions in the case of a dichotomous outcome. In addition, it extends the FBAT-I test (Lake and Laird, 2003) to allow for nuclear families of arbitrary size (rather than case–parent triads) and incomplete parental genotypes. It does not require modeling the effects of environmental exposures and is doubly robust in the sense of being valid if either a model for the genetic main effect holds or a conditional mean model for the environmental exposure (given the offspring affection status and parental mating types) but not necessarily both. In addition, it is locally most powerful when both these working models are correctly specified. The proposed methodology enables estimation of the genetic main effect size along with the magnitude of the gene–environment interaction. We illustrate the methods via simulation studies and an application to an Alzheimer's disease study.
2. MODEL AND INFERENCE
Consider a family-based study that collects genotypic data on mi (i = 1,…,n) affected offspring and their parents, where n is the number of independent families. We define to be the vector of coded offspring genotypes. For the jth affected offspring in the ith family, the possibly multivariate environmental exposure variable is denoted by Zij and the phenotype (i.e. affection status) is denoted by Yij, which takes the value 1 (encoding affected) within the study sample. The corresponding vectors of phenotypes and environmental exposures are and , respectively. 𝒮i denotes the parental genotype.
Our goal in this article is to assess whether the effect of the genotype Xi on the affection status Yi is modified by environmental exposures Zi. Major obstacles in doing so are (a) the possibility that the genetic effect on the trait is confounded due to population admixture and (b) the fact that families are sampled conditional on the trait (i.e. only affected offspring are sampled). We discuss these problems and corresponding solutions in turn in Sections 2.1 and 2.2.
2.1. Phenotypic model
The effect of the genotype on the trait cannot usually be estimated from ordinary regression models for the trait, conditional on genotypes and environmental exposures, because of unmeasured confounders such as ethnicity that may be correlated with the phenotype and be associated with the genotypes. To gain insight into this problem, we consider the causal diagram (Pearl, 1995; Pearl, 2000; Robins, 2001) for a single family in Figure 1. It expresses the standard scenario for FBATs as described in Rabinowitz and Laird (2000) but augmented with exposure variables. In this diagram, U is an unmeasured confounder encoding population admixture.
Fig. 1.
Causal directed acyclic graph.
All edges in the causal diagram of Figure 1 represent the possibility of a direct causal relationship. In particular, the diagram allows for measured or unmeasured exposures (i.e. Z and U, respectively) to be associated with the parental mating types. The basic assumption in the diagram is that the unmeasured confounders are associated with the offspring's genotype only through their association with the parental genotype. Under this assumption, the genetic effect on the trait can be assessed from the conditional trait distribution . In view of this, we will postulate that the penetrance function relates to the genotype, the environmental exposures, and parental genotype through
| (2.1) |
where β* is an unknown finite-dimensional parameter vector and and are unknown functions. In model (2.1), the choice encodes the null hypothesis of no gene–environment interaction. Likewise, the functions and encode the main effects of environmental exposures and main genetic effects, respectively. Maternal effects and imprinting are implicitly allowed for through the unspecified term .
We define 𝒜 to be the model given by the restrictions of model (2.1) with the additional assumption that the environmental effects do not distort Mendel's law of random segregation in the sense that for , . This is a standard assumption when testing for gene–environment interaction (Lake and Laird, 2003; Umbach and Weinberg, 2000) which we will make throughout this paper and which expresses the idea that the considered environmental exposures are not affected by the candidate gene.
2.2. A doubly robust test for gene–environment interaction
To estimate β* indexing model 𝒜, suppose first that we have data available from affected case–parent triads so that for . To account for ascertainment, we use the idea of the original TDT (Spielman and others, 1993) and variations thereof (Liu and others, 2002) by following a conditional likelihood approach where we base inference on the conditional distribution of the observed data, given the trait. Under model 𝒜, the conditional likelihood of the observed data for a single family, given the offspring affection status, equals
where
| (2.2) |
| (2.3) |
with and a known and unknown density, respectively. Here, we make use of the fact that . We do not propose estimation based on the conditional likelihood as this would require modeling the joint density . This is undesirable (Gauderman, 2003) since our goal is to obtain tests for gene–environment interaction that are consistent regardless of such model. Furthermore, we show in Appendix A of the supplementary material available at Biostatistics online that even if one were to use the information available through the law , then, in the absence of modeling restrictions on , one would not obtain feasible estimators with greater efficiency.
In Appendix B of the supplementary material available at Biostatistics online, we show that the locally efficient score test for in model 𝒜 equals , where
| (2.4) |
In practice, neither nor is usually known. Indeed, this is also true for because the main effects of the gene may cause the transmission of allelles to deviate from the Mendelian law and hence may differ from , which is known. When the observed nuclear families are small and there are no incomplete parental marker data, then the unknown population means in the expression for Si can be substituted by nonparametric estimators
and
obtained as the corresponding parental mating type–specific sample means. In the case of affected case–parent triads without missing parental mating types, the resulting score test equals the FBAT-I test (Lake and Laird, 2003). It thus follows as a corollary from our calculations in Appendix B of the supplementary material (available at Biostatistics online) that the FBAT-I test is the efficient score test for gene–environment interaction under model 𝒜. Our results further express the asymptotic distribution of the locally efficient score test, complementing the exact permutation-based distribution used by Lake and Laird (2003). Indeed, because Si remains unbiased under the null hypothesis when either or is misspecified, nonparametric estimation of these expectations does not affect the asymptotic distribution of the test statistic (Newey and McFadden, 1994). We conclude that
with
is asymptotically normally distributed with mean zero under the null hypothesis and with variance which can be consistently estimated by the sample variance of
.
When, as often, some of the observed nuclear families carry more than 1 affected offspring and/or parental genotypes are incomplete, then the FBAT-I test statistic cannot be used. For nuclear families with multiple affected offspring, a similar derivation as in Appendix A of the supplementary material (available at Biostatistics online) shows that feasible efficient estimators can be obtained by conducting maximum conditional likelihood inference. However, this requires models for the joint distribution . We choose not to postulate such models because (a) they are difficult to specify; (b) their additional features, over and above those given by the marginal distributions, are not in themselves of scientific interest; (c) their misspecification could lead to inconsistent estimators of the genetic effect; and (d) estimators of the genetic effect under such models are computationally more demanding. Instead, we propose to conduct inference based on the partial pseudo-likelihood which, for the ith nuclear family, is given by
![]() |
(2.5) |
under the realistic assumption that the jth offsprings’ genotype is independent of other offsprings’ disease status and environmental exposures, conditional on its own disease status, environmental exposures, and parental mating types. This suggests testing the null hypothesis using the test statistic
with
redefined as
![]() |
(2.6) |
Furthermore, when the parental genotypes are incomplete, then, using similar arguments as in Rabinowitz and Laird (2000) and Hoffmann and others (2009), it can be shown that our results continue to be valid when the parental marker data are substituted by the sufficient statistic for the parental genotype distribution (which, with a slight abuse of notation, we continue to denote with 𝒮i). A key consideration in this derivation is that the assumed conditional independence of the offspring exposure and genotype, given the parental marker data, implies conditional independence given the sufficient statistic (Hoffmann and others, 2009).
Unfortunately, the data available to estimate the required conditional expectations and will be sparse when some of the observed nuclear families have moderate or large sizes and the parental genotypes are incomplete because this can make the sufficient statistic 𝒮i high-dimensional. In that case, the locally efficient score test based on
may have an erratic behavior. Due to the curse of dimensionality, we are therefore usually forced to place more stringent dimension-reducing modeling restrictions on the law of the observed data.
We consider 2 possible dimension-reducing strategies. The first strategy is to assume that in addition to model 𝒜, model ℬ for the main genetic effect holds, which is defined by
| (2.7) |
where is a known function, smooth in η, and is an unknown finite-dimensional parameter. Furthermore, assume that with representing (without loss of generality) a reference category for the offspring genotype. In practice, the usual model of choice for the genetic main effect will only contain the main effect of the offspring genotype, that is, . Although there may be interactions with 𝒮i, these may be difficult to interpret. For instance, the interaction between offspring genotype and the expected offspring genotype is reflected in
, but its interpretation is hindered by the fact that the expected marker coding is partly determined by the degree of information on the parental genotype.
The parameter can be consistently estimated under the null hypothesis of no gene–environment interaction by maximizing the partial pseudo-likelihood (2.5), that is, by solving
![]() |
for η, where under the null hypothesis, for an arbitrary function ,
![]() |
We denote the resulting estimator by
. A consistent estimator of under the null hypothesis of no gene–environment interaction is then obtained as
and can be substituted in expression (2.6).
The second strategy is to assume that model 𝒞 holds, which is defined by a parametric model for the conditional mean of the environmental exposures given the trait and parental mating types
| (2.8) |
where is a known function smooth in θ and is an unknown finite-dimensional parameter; for example, or logit( if the parental genotype is observed and with replaced by otherwise. It is easily shown that model is a well-defined model for the observed data in the sense that there exists an observed data law satisfying the restrictions of this model (this is because model 𝒜 does not restrict the conditional association between Y and Z, given 𝒮). A consistent estimator
of can be assessed by solving generalized estimating equations with estimating function for the ith family equal to
![]() |
A consistent estimator for under the null hypothesis of no gene–environment interaction is obtained as
and can be substituted in expression (2.6).
In practice, one of the 2 models (2.7) or (2.8) needs to be imposed because of the curse of dimensionality. Interestingly, the test statistic
where
![]() |
attains the nominal α-level so long as model 𝒜 and one of the models (2.7) or (2.8), but not necessarily both, is correctly specified (i.e. so long as the union model holds in addition to model 𝒜) because the test statistic has mean zero under the null hypothesis as soon as one of the expectations or is known correctly but not necessarily both. This is desirable as it can be difficult to specify biologically realistic models for the exposure outcome in which case protection against bias is offered under a correctly specified model for the genetic effect. In line with Robins and Rotnitzky (2001), we call the resulting test a doubly robust test for gene–environment interaction. It further follows from Appendix B of the supplementary material (available at Biostatistics online) that, when , (2.6) is the efficient score for β under model 𝒜 (and following the general results in Robins and Rotnitzky (2001) then also in model ) at the intersection model (i.e. when both models (2.7) and (2.8) are correctly specified).
The asymptotic distribution of the doubly robust test
must acknowledge that and θ* are estimated. It is derived in Appendix B of the supplementary material (available at Biostatistics online) and summarized for the case of in Theorem 2.1 below. We further show in the Appendix of the supplementary material (available at Biostatistics online) that when model 𝒜 and both models (2.7) and (2.8) hold, this distribution is the same regardless of how η* and θ* were estimated and regardless of whether they were known. In practice, it thus follows that the choice of estimators for η* and θ* will have little or no impact on the power of the doubly robust test. Because of the smoothness of the score function, the regularity conditions required for the bootstrap can be expected to hold so that one can also use the bootstrap as a tool for inference.
THEOREM 2.1
Suppose that the regularity conditions in Appendix B of the supplementary material (available at Biostatistics online) hold and that . Let Ti be the doubly robust test statistic
, corrected for estimation of η* and θ*:
with
and
which is equal to
under the null hypothesis of no gene–environment interaction.
Then under model , we have that the score function Ti satisfies
under the null hypothesis of no gene–environment interaction, where Σ can be consistently estimated by the sample variance,
, of Ti. It follows under the null hypothesis that
2.3. Estimation
Once a genetic effect and/or gene–environment interaction have been established, it is of interest also to quantify the magnitude of these effects. Estimates for these follow as a simple by-product of our proposed approach. In particular, η* and β* can be estimated under model by jointly maximizing the partial pseudo-likelihood (2.5). We thus obtain the following pseudo-scores for (η*,β*):
![]() |
with, for an arbitrary function ,
A consistent estimator
for (η*,β*) is thus obtained by solving
It follows from standard asymptotic theory (e.g. Arnold and Strauss, 1991) that under mild regularity conditions, the estimator
in model 𝒜 will be a consistent and asymptotically normal estimator and that the asymptotic covariance matrix of
equals
![]() |
with all terms evaluated at and . This can be consistently estimated by replacing η* and β* by
and
, respectively, and population variances and expectations by sample analogs.
In Sections 3 and 4, we will consider the choice , for which
![]() |
and
![]() |
REMARK
The previously considered estimator
of the gene–environment interaction β* does not satisfy the aforementioned double robustness property in the sense that it may fail to be a consistent estimator of β* as soon as the genetic main effect is misspecified. Vansteelandt, VanderWeele, and others (2008) develop doubly robust estimators of interaction parameters based on random population samples. It follows from their results that when are randomly sampled conditional on , then with
is an unbiased estimating equation for β* in model 𝒜 when either is correctly specified or and are correctly specified. Because implies , the estimator of β* obtained by solving continues to be valid for the analysis of family studies of affected offspring, but only provided that the expected exposure level is a priori known or, more realistically, can be consistently estimated from external data (e.g. from additional data on unaffected offspring under a rare disease assumption). The requirement of being able to specify or estimate is needed because the factorization in the numerator of
suggests that and cannot be separately identified nonparametrically from the observed data distribution. Unlike the estimator proposed earlier in this section, the resulting estimator of β* is then doubly robust in the sense of being a consistent estimator of β* when either or is correctly specified. The parameters indexing can be estimated by solving . Having chosen a parametric model for , the parameters indexing this model can be solved from a similar score function.
3. SIMULATION STUDY
Using simulation studies, we examine the properties of the doubly robust test for gene–environment interaction. We assess the type I error rate and power of the methodology for realistic sample sizes () under several scenarios without missing parental genotype and scenarios with incomplete parental genotypes, both in the absence and presence of population admixture. For case–parent triads without missing parental genotypes, we compare the power with the power of FBAT-I (Lake and Laird, 2003). To gain insight into the doubly robust nature of our proposed test, we evaluate scenarios where both the model for the genetic main effect and the conditional mean model for the exposure hold, scenarios where only one of the models holds and scenarios where both models are misspecified. More details on the simulation set-up and results can be found in the supplementary material available at Biostatistics online.
In summary, we find that the doubly robust test maintains the nominal significance level well and provides results that are very comparable with those of the FBAT-I test (Lake and Laird, 2003) with respect to power. A major advantage is that the proposed test can be used with missing parental genotypes. While the proposed test can suffer numerical instability in finite samples, this can be overcome by ignoring adjustment for nuisance parameter estimation at the risk of a slight loss of power.
4. APPLICATION TO A FAMILY-BASED STUDY OF ALZHEIMER'S DISEASE
In this section, we analyze family data on individuals affected with Alzheimer's disease. The data has been previously analyzed in Mokliatchouk and others (2000) and Lange and others (2004). For this analysis, data on 308 nuclear families with 2–10 siblings was available. The study subjects were ascertained so that each family contains at least one affected proband. While the genotypes of all siblings in a family are known, the parental genotypes are not available.
In a first step, we test whether there is an interaction between the a2m gene and the ApoE gene, a gene known to be linked to Alzheimer's disease. To this end, we use the doubly robust test for gene–environment interaction with ApoE as environmental variable. Note that the assumption of independence between a2m and ApoE, conditional on the parental genotype, is reasonable since both genes are located on different chromosomes. Next, we test for an interaction between the a2m gene and age to investigate whether the assocation between the a2m genoytpe and risk for Alzheimer's disease alters according to age.
a2m is a bi-allelic gene (alleles “a2m-1” and “a2m-2”), with the target allele being a2m-2. In our coding, counts the number of a2m-1 alleles. Because there are missing parental genotypes, we first obtain the distribution of the a2m genotype conditional on the sufficient statistic for the 805 probands of the 308 pedigrees. We then retain the affected offspring, which leaves 616 individuals from 308 pedigrees. We consider the penetrance function (2.1) in which the genetic main effect of a2m is modeled as . Under the assumption of no gene–environment interaction, the methods in Section 2.2 yield a point estimate of
= −0.254 (95% confidence interval [CI] −0.40 to −0.11).
Let represent the genotype for ApoE in individual j of pedigree i (). We then consider the following model for the conditional mean of :
in which we assume a logistic model for containing the main and quadratic effects of ( and (the conditional offspring genotype distribution) and their interaction term. The parameter θ* is estimated by solving the corresponding estimating equations. Note that for each individual, there are 2 observations for the presence or absence of allele corresponding to the 2 inherited parental alleles. This means that θ* is estimated based on 2×616 observations. The corrected test statistic of Theorem 1 for the interaction test equals 0.49. This corresponds to a p-value of 0.63. We conclude that there is no significant interaction between both genes.
Age ranges from 49 to 104 years with mean 79.4. Let represent the age of individual j in pedigree i. We then assume a linear model for the conditional mean of with the main and quadratic effects of and and their interaction term. The corrected test statistic for the interaction test equals 1.60 (the uncorrected test statistic
equals 1.61) corresponding to a p-value of 0.11. We conclude that there is at most marginal evidence that the joint effects of the a2m gene and age are not acting multiplicatively on Alzheimer's disease.
To investigate further the magnitude of this interaction, we simultaneously estimate the size of the main effect of a2m (η*) and the effect size of the interaction with age (β*) following the methods in Section 2.3. This yields a point estimate of
= −1.77 (95% CI −3.71 to 0.18) and
= 0.019 (95% CI to ). Note that
/SE(
) = 1.52 which resembles the doubly robust test statistic for gene–environment interaction.
For a fixed age, the RR of being affected for versus and for versus equals . When increases one unit, this RR is multiplied by . The fact that
is positive suggests that the estimated RR increases for increasing age. In particular, for an age of 93 years, the estimated RR equals 1. For younger subjects, the estimated RR is smaller than 1 so that an extra a2m-2 allele is associated with a higher risk for Alzheimer's disease. For subjects older than 93 years, each extra copy of a2m-2 corresponds to a lower risk. Figure 2 shows the estimated relative risk
as a function of age , together with the corresponding 95% CI. We find that the RR is only significantly different from 1 for ages less than 84 years. It follows that there is mild evidence of a genetic effect for subjects younger than 84 in the sense that each extra copy of a2m-2 is associated with a higher risk for Alzheimer's disease.
Fig. 2.
Relative risk (solid line) of Alzheimer's disease for each extra copy of the a2m-1 allele in function of age, along with 95% pointwise CIs (dashed line).
5. DISCUSSION
In this paper, we have developed a doubly robust test for gene–environment interaction. Such a test is “almost” asympotically distribution-free in the sense that it is valid when either the main genetic effect is correctly specified or a conditional mean model for the environmental exposure (given the trait and the sufficient statistic) but not necessarily both. While the genetic main effect may be of interest in itself, an advantage of using such a doubly robust procedure is that it preserves the type I error rate of the gene–environment interaction test even under misspecification of the main genetic effect, provided that the model for the mean environmental exposure is approximately correct. The proposed test forms a natural extension of the FBAT-I test (Lake and Laird, 2003) to nuclear families of arbitrary size and with possibly incomplete parental mating types. With concern for bias due to model misspecification, we believe that such a test is generally the best that can be hoped for because asymptotically distribution-free tests for gene–environment interaction may behave erratically due to the curse of dimensionality.
The proposed approach has close links to the analysis of case-only designs in population-based studies (Umbach and Weinberg, 1997; Yang and others, 1999). A major advantage is that it yields estimates of the genetic association, despite the fact that only data from affected offspring is used or available. The central idea leading to those estimates, that is, postulating a model for the trait distribution and estimating the parameters indexing that model from the conditional genotype distribution given the traits, is also found in Liu and others (2002) and references therein, and is implicit in the work of Whittemore (2004). This paper implicitly generalizes these approaches in several ways. First, unlike Whittemore (2004), we enable consistent estimation of the genetic association of interest, regardless of distributional assumptions for the parental genotypes. Second, we guarantee robustness against population admixture by stratifying the analysis on the sufficient statistic for the parental genotype distribution, which is known under the assumption of Mendelian transmission. In the work of Liu and others, 2002, such robustness is only guaranteed under the null hypothesis of no genetic association. This is because, for the latter approach to be generally valid, it must be true that the trait is not associated with parental mating types, conditional on the offspring's genotypes. Such association is likely present when there is confounding due to population admixture. However, to guard against population admixture in the presence of incompletely observed parental genotypes, we chose to condition on sufficient statistics for the genotype distribution. While our estimators are locally efficient under the chosen model which involves such conditioning, potentially more efficient estimators could be obtained by conditioning on the parental genotypes and then projecting the observed data score onto the nuisance tangent space for the distribution of parental allele frequencies, as in Allen and others (2005).
Finally, we wish to remark that the presence of gene–environment interaction, as identified by our test, indicates that the effect of the genotype on the trait is different in different environments (VanderWeele, 2009). Because one cannot intervene on the genotype, a more desirable goal may be to assess whether the effect of the environment on the trait varies with the genotype. Estimates for the latter gene–environment interaction are harder to obtain as they cannot generally be made robust against unmeasured confounding for the association between the trait and environmental exposures.
SUPPLEMENTARY MATERIAL
Supplementary material is available at http://biostatistics.oxfordjournals.org.
FUNDING
Fund for Scientific Research (Belgium) to S.V.; the Interuniversity Attraction Pole research network grant from the Belgian government (Belgian Science Policy) (P06/03 to S.V.); the Special Research Fund (BOF) of Ghent University (01J16607 to B.M.); National Institute of Mental Health (1R01MH081862, 1R01MH087590 to C.L.)
Supplementary Material
Acknowledgments
This work was partially conducted while Stijn Vansteelandt was visiting the Department of Biostatistics of the Harvard School of Public Health. Conflict of Interest: None declared.
References
- Allen AS, Satten GA, Tsiatis AA. Locally-efficient robust estimation of haplotype-disease association in family-based studies. Biometrika. 2005;92:559–571. [Google Scholar]
- Arnold BC, Strauss D. Pseudolikelihood estimation—some examples. Sankhya B. 1991;53:233–243. [Google Scholar]
- Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation for semiparametric models. Springer-Verlag. 1993. New York. [Google Scholar]
- Chatterjee N, Kalaylioglu Z, Carroll RJ. Exploiting gene-environment independence in family-based case-control studies: increased power for detecting associations, interactions and joint effects. Genetic Epidemiology. 2005;28:138–156. doi: 10.1002/gepi.20049. [DOI] [PubMed] [Google Scholar]
- Cordell HJ, Barratt BJ, Clayton DG. Case/pseudocontrol analysis in genetic association studies: a unified framework for detection of genotype and haplotype associations, gene-gene and gene-environment interactions, and parent-of-origin effects. Genetic Epidemiology. 2004;26:167–185. doi: 10.1002/gepi.10307. [DOI] [PubMed] [Google Scholar]
- Dudbridge F. Likelihood-based association analysis for nuclear families and unrelated subjects with missing genotype data. Human Heredity. 2008;66:87–98. doi: 10.1159/000119108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gauderman WJ. Candidate gene association analysis for a quantitative trait, using parent-offspring trios. Genetic Epidemiology. 2003;25:327–338. doi: 10.1002/gepi.10262. [DOI] [PubMed] [Google Scholar]
- Hoffmann TJ, Lange C, Vansteelandt S, Laird NM. Gene-environment interaction tests for dichotomous traits in trios and sibships. Genetic Epidemiology. 2009;33:691–699. doi: 10.1002/gepi.20421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laird NM, Horvath SM, Xu X. Implementing a unified approach to family based tests of association. Genetic Epidemiology. 2000;19:S36–S42. doi: 10.1002/1098-2272(2000)19:1+<::AID-GEPI6>3.0.CO;2-M. [DOI] [PubMed] [Google Scholar]
- Lake SL, Laird NM. Tests of gene-environment interaction for case-parent triads with general environmental exposures. Annals of Human Genetics. 2003;68:55–64. doi: 10.1046/j.1529-8817.2003.00073.x. [DOI] [PubMed] [Google Scholar]
- Lange C, Blacker D, Laird NM. Family-based association tests for survival and times-to-onset analysis. Statistics in Medicine. 2004;23:179–189. doi: 10.1002/sim.1707. [DOI] [PubMed] [Google Scholar]
- Lange C, Laird NM. On a general class of conditional tests for family-based association studies in genetics: the asymptotic distribution, the conditional power, and optimality considerations. Genetic Epidemiology. 2002;23:165–180. doi: 10.1002/gepi.209. [DOI] [PubMed] [Google Scholar]
- Liu Y, Tritchler D, Bull SB. A unified framework for transmission-disequilibrium test analysis of discrete and continuous traits. Genetic Epidemiology. 2002;22:26–40. doi: 10.1002/gepi.1041. [DOI] [PubMed] [Google Scholar]
- Mokliatchouk O, Blacker D, Rabinowitz D. Association tests for traits with variable age at onset. Human Heredity. 2000;51:46–53. doi: 10.1159/000022959. [DOI] [PubMed] [Google Scholar]
- Newey WK, McFadden D. Large sample estimation and hypothesis testing. In: Engle, R.F.and McFadden, D.L. Handbook of Econometrics. 1994;Volume 4:2180. [Google Scholar]
- Pearl J. Causal diagrams for empirical research (with discussion) Biometrika. 1995;82:669–710. [Google Scholar]
- Pearl J. Causality: Models, Reasoning, and Inference. Cambridge: Cambridge University Press; 2000. [Google Scholar]
- Rabinowitz D, Laird N. Adjusting association test for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Human Heredity. 2000;50:211–223. doi: 10.1159/000022918. [DOI] [PubMed] [Google Scholar]
- Robins JM. Data, design and background knowledge in etiologic inference. Epidemiology. 2001;11:313–320. doi: 10.1097/00001648-200105000-00011. [DOI] [PubMed] [Google Scholar]
- Robins JM, Rotnitzky A. Comment on the Bickel and Kwon article, “Inference for semiparametric models: Some questions and an answer.”. Statistica Sinica. 2001;11:920–936. [Google Scholar]
- Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM) American Journal of Human Genetics. 1993;52:506–516. [PMC free article] [PubMed] [Google Scholar]
- Umbach DM, Weinberg CR. Designing and analysing case-control studies to exploit independence of genotype and exposure. Statistics in Medicine. 1997;16:1731–1743. doi: 10.1002/(sici)1097-0258(19970815)16:15<1731::aid-sim595>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]
- Umbach DM, Weinberg CR. The use of case-parent triads to study joint meets of genotype and exposure. American Journal of Human Genetics. 2000;66:251–261. doi: 10.1086/302707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanderWeele TJ. On the distinction between interaction and effect modification. Epidemiology. 2009;20:863–871. doi: 10.1097/EDE.0b013e3181ba333c. [DOI] [PubMed] [Google Scholar]
- Vansteelandt S, DeMeo DL, Su J, Smoller J, Murphy AJ, McQueen M, Schneiter K, Celedon JC, Weiss ST, Silverman EK and others. Testing and estimating gene-environment interactions in family-based association studies. Biometrics. 2008;64:458–467. doi: 10.1111/j.1541-0420.2007.00925.x. [DOI] [PubMed] [Google Scholar]
- Vansteelandt S, VanderWeele T, Tchetgen E, Robins JM. Multiply robust inference for statistical interactions. Journal of the American Statistical Association. 2008;103:1693–1704. doi: 10.1198/016214508000001084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whittemore A. Estimating genetic association parameters from family data. Biometrika. 2004;91:219–225. [Google Scholar]
- Yang QH, Khoury MJ, Sun FZ, Flanders WD. Case-only design to measure gene-gene interaction. Epidemiology. 1999;10:167–170. [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


















