Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2012 Dec 23;14(3):556–572. doi: 10.1093/biostatistics/kxs048

Incorporating parental information into family-based association tests

Zhaoxia Yu 1,*, Daniel Gillen 1, Carey F Li 2,3, Michael Demetriou 2,3
PMCID: PMC3732025  PMID: 23266418

Abstract

Assumptions regarding the true underlying genetic model, or mode of inheritance, are necessary when quantifying genetic associations with disease phenotypes. Here we propose new methods to ascertain the underlying genetic model from parental data in family-based association studies. Specifically, for parental mating-type data, we propose a novel statistic to test whether the underlying genetic model is additive, dominant, or recessive; for parental genotype–phenotype data, we propose three strategies to determine the true mode of inheritance. We illustrate how to incorporate the information gleaned from these strategies into family-based association tests. Because family-based association tests are conducted conditional on parental genotypes, the type I error rate of these procedures is not inflated by the information learned from parental data. This result holds even if such information is weak or when the assumption of Hardy–Weinberg equilibrium is violated. Our simulations demonstrate that incorporating parental data into family-based association tests can improve power under common inheritance models. The application of our proposed methods to a candidate-gene study of type 1 diabetes successfully detects a recessive effect in MGAT5 that would otherwise be missed by conventional family-based association tests.

Keywords: Case–parents, Dominant, Mode of inheritance, Nuclear families, Recessive, Robust

1. Introduction

In genetic association studies, especially genome-wide association studies (GWASs), genetic markers are routinely tested under the assumption of additive effects. For example, the commonly used transmission disequilibrium test (TDT) (Spielman and others, 1993) is a score test assuming multiplicative genotype-relative risks (GRRs; additive on the log-scale) using conditional logistic regressions (Schaid and Sommer, 1993; Self and others, 1991). For case-control studies, when Hardy–Weinberg equilibrium (HWE) holds, the widely used allelic test is asymptotically equivalent to the Cochran–Armitage trend test, a score test corresponding to a logistic regression model under the assumption of multiplicative odds ratios (additive on the log-scale) (Sasieni, 1997). Although convenient to use, these tests are optimal only when the true underlying genetic model is additive and they are not robust against model misspecification.

One alternative to testing under an assumed additive effect model is to specify a reference genotype, and consider an unrestricted test with separate association parameters for the other two genotypes of a single nucleotide polymorphism (SNP). Under the null hypothesis, the resulting statistic from a score or likelihood ratio test of no difference across genotypes asymptotically follows the chi-squared distribution with two degrees of freedom (hereinafter referred to as 2DF). Although robust, the two-degree freedom test suffers from reduced power when the true underlying genetic model is additive. This power loss is a real concern because it has been suggested that the additive model is likely to be the most common model for complex traits (Hill and others, 2008). Furthermore, even if the true model is non-additive, the non-additivity can rapidly be distorted by imperfect linkage disequilibrium (Zheng and others, 2009; Vukcevic and others, 2011, Bhangale and others, 2008; Sham and others, 2000), with the rate of distortion proportional to the quartic of linkage disequilibrium (Vukcevic and others, 2011). Thus, it is important to develop tests that are both robust against model misspecification and maintain power for additive effects.

Several robust methods for genetic associations have been proposed and studied in the past 10 years. The maximin efficiency robust test (MERT) uses a linear combination of the test statistics of the three well-known genetic models, namely additive, dominant, and recessive (Gastwirth and Freidlin, 2000; Freidlin and others, 2002; Zheng and others, 2002). A simple maximum of the three test statistics (MAX3) has also been considered (Gonzalez and others, 2008; Zheng and others, 2002; Freidlin and others, 2002). Simulation studies indicate that MAX3 is generally more powerful than MERT and 2DF (Zheng and others, 2002; Li and others, 2009, Lettre and others, 2007).

It has been noted that the signs of the HWE coefficients in cases and controls provide information for the underlying true genetic model (Wittke-Thompson and others, 2005). Statistical tests that incorporate the difference in HWE coefficients between cases and controls have been used to either improve power or construct robust tests of genetic association (Song and Elston, 2006; Zheng and Ng, 2008; Joo and others, 2010; Zheng and others, 2007; Xu and others, 2012). In particular, Zheng and Ng (2008) used this information to choose the most likely genetic model, whereas Joo and others (2010) used it to eliminate the most unlikely genetic model. When the disease risk allele is known, other authors have considered to use an order restricted genetic model (Wang and Sheffield, 2005; Chen and Ng, 2012; Troendle and others, 2009; Luss and others, 2012). This approach yields similar power to MAX3 (Gonzalez and others, 2008).

All of the above-mentioned methods use the same data to infer the genetic model and to conduct the tests of genetic association. In family-based studies, where association tests are conducted conditional on parental information, or corresponding sufficient statistics, the information on parental-mating-type data is not incorporated into the testing procedure in order to protect against inflation of the false-positive rate caused by population stratification (Horvath and others, 2001; Rabinowitz and Laird, 2000). Although this information is not robust against population stratification, it nevertheless provides a certain amount of information about association parameters. In fact, for quantitative traits, such information has been used for self-replication, deriving prior distributions for the genetic associations of interest, obtaining weights that are used to combine multiple SNPs, or prioritizing markers to reduce the burden of multiple comparisons (Laird and Lange, 2009; Feng and others, 2007; Lange and others, 2003; Van Steen and others, 2005; Ionita-Laza and others, 2007; Qin and others, 2010; Naylor and others, 2010; Xu and others, 2006; Gauderman and others, 2010). This strategy, however, cannot be directly applied when there is no phenotypic variation in the probands, such as in the case–parents design, as the information matrix in parental mating types is ill-conditioned (Murphy and others, 2008). One potential solution is to use a subset of the mating types to estimate the effect sizes (Murphy and others, 2008). The parental phenotypes, when available, can also be used to prioritize SNPs for family-based association tests (Qin and others, 2010). Another way of using parental information is to combine the parental genotype–phenotype information with family-based association tests (Purcell and others, 2005); however, this approach does not guarantee complete robustness against population stratification.

In this article, we focus on incorporating parental information to construct tests that are robust against both population stratification and the true underlying genetic model. Our article is outlined as follows. In Section 2, we present strategies to infer genetic models from parental-mating-type data or parental genotype–phenotype data. In Section 3, we present the results of a simulation study that demonstrates the advantage of our proposed methods. A real example is presented in Section 4, where we show that our methods successfully detect a recessive effect that would not have been detected by standard methods. The article concludes with a discussion of our proposed methodology and avenues for future work.

2. Methods

Consider an SNP that follows HWE in the population. Let A and B denote the risk and the reference allele, respectively, and let p denote the frequency of allele A in the population. When the order of the two parents is ignored, there are six possible mating types, as shown in the second column of Table S1 in supplementary material available at Biostatistics online. We use ni, i=1,2,2,…, to denote the number of couples that fall in the ith mating type among n total couples. Let eβ0 be the risk of disease for subjects with the BB genotype and define

2.

Thus, eβ1 is the GRR of genotype AB to genotype BB and eβ2 is the GRR of genotype AA to genotype BB. Let GP and GO be the genotypes of the parents and the genotypes of the offspring, respectively; let DP and DO be the disease statuses of the parents and the offspring, respectively. In a case–parents study, a family is ascertained because the offspring is affected, i.e., DO=1. The population frequencies of mating types can then be expressed in terms of p, β1, and β2, as shown in Column 1 of Table S1 in supplementary material available at Biostatistics online (Schaid and Sommer, 1993). It has previously been noted that the likelihood of family data can be factored (Clayton, 1999; Whittemore and Tu, 2000). Specifically, with the assumption that the disease statuses of the family members are conditionally independent given their genotypes, the factorization is as follows:

2.

Thus the information in case–parents data can be partitioned into three independent components: LNF(β1,β2), the non-founder likelihood that is used to construct family-based association tests such as the TDT; LM(p,β1,β2), the likelihood based upon mating-type data; and LPGP(β0,β1,β2), the likelihood based upon parental genotype–phenotype data. Note that the βi’s in LPGP(β0,β1,β2) denote log-odds ratios, rather than log-GRRs. Although GRRs and odds ratios are not identical, they are approximately the same for rare diseases, and we expect that they provide similar information regarding the mode of inheritance.

2.1. The non-founder likelihood

The non-founder likelihood leads to a conditional approach (Schaid and Sommer, 1993; Self and others, 1991). By conditioning on parental genotypes,

2.1.

where nij is the number of families with the ith mating type and jth offspring type, as shown in the first column of Table S1 in supplementary material available at Biostatistics online. The widely used TDT is the score test derived under the assumption of the additive model (Schaid and Sommer, 1993). In this article, the additive model refers to the additivity of the log-GRRs, i.e., β2=2β1, and we denote the resulting test ADD. Two other score tests, DOM, which assumes the dominant model, i.e., β1=β2, and REC, which assumes the recessive model, i.e., β1=0, can be derived similarly (Zheng and others, 2002; Schaid and Sommer, 1993). We define ADD, DOM, and REC to be the signed square-root of the above score statistics, given by

2.1.

Under the null hypothesis of no association, all three test statistics are asymptotically distributed as standard normal random variables. Their asymptotic correlations under the null hypothesis of no association can be derived using the delta method and are estimated by Zheng and others (2002)

2.1.

Under the null hypothesis of no association, the joint distribution of the three test statistics can be approximated by the multivariate normal distribution with the above pairwise correlations. The significance of both MERT and MAX3 can be evaluated using this approximation. In MAX3, the maximum of |ADD|, |DOM|, and |REC| is used as a test statistic and its p-value can be calculated using double integrals based on the multivariate normal approximation (Zheng and others, 2002).

Another robust statistic we consider in this article is the score test statistic with two degrees of freedom (Schaid and Sommer, 1993):

2.1.

where S1=(n21n2/2)+(n41n4/2)+(n51n5/2),S2=(n22n2/2)+(n42n4/4). This test has no restriction on either β1 or β2. Under the null hypothesis, the test statistic is asymptotically distributed as a chi-squared random variable with two degrees of freedom.

2.2. The mating-type likelihood

For the mating-type likelihood, we have

2.2.

The Fisher information matrix of the mating-type likelihood is ill-conditioned (Murphy and others, 2008). However, the following section shows that mating-type data can provide useful information about the true underlying genetic model when we put restrictions on the parameters using the dominant or recessive model.

2.2.1. The information in mating-type data

Proposition 1 —

Mating-type data provide no information for association tests of additive genetic effects.

The assumption of additive effects implies β2=2β1=2β. In Section A of the Supplementary material, we showed that Rao’s efficient score (Rao, 1948) is always zero and the expectation of the determinant of the Fisher information matrix is zero. Thus, under the additive model, there is no information regarding β in mating-type data. At first glance, this might seem surprising. One explanation is that for both the null hypothesis of no genetic association and the additive model, the mating types follow random mating, i.e., the probability of a mating type equals the product of genotype probabilities of two parents. This result is established in Section A of the Supplementary material.

Proposition 2 —

Let

graphic file with name M8.gif

where Inline graphic, Inline graphic, and Inline graphic. Z follows the standard normal distribution asymptotically when the GRRs are log-additive, i.e., β2=2β1. Furthermore, the sign of its expectation depends on the true underlying genetic model. Specifically, sign(E[Z])= sign(β2−2β1). As a result, the expectation of Z is positive when the true model is recessive and negative when the true model is dominant.

This test statistic was motivated by the fact that the score under the recessive genetic model has a zero expectation not only when β2=β1=0, but also when β2=2β1>0. Based on this observation, we can normalize the score to obtain Z as a test of the hypothesis that the true GRRs are log-additive. The derivation and proof are provided in Section B of the Supplementary material.

In the numerator of Z, Inline graphic is the expected frequency of AA children under non-preferable transmission given parental mating-type data, while Inline graphic is the expected frequency of AA children under non-preferable transmission given the pool of parental alleles. The latter term also implies random mating of parents and is therefore stronger than the first. It follows that the difference is expected to be zero when the ascertained parents show random mating. We have shown in Section A of the Supplementary material that the ascertained parents show random mating when the true underlying effect is additive. As the additive model includes no association as a special case, Z is asymptotically distributed as a standard normal random variable when there is either no genetic association or an additive association, and it has a non-zero mean when the two alleles interact with one another.

2.2.2. Model selection and elimination based on mating-type data

Proposition 2 indicates that Z provides useful information for the true underlying model. To take advantage of this, we propose a model-selection-elimination (MSE) approach, which is depicted in Figure 1. As shown in Figure 1, we perform an extreme test (DOM or REC) when there is sufficient evidence of the corresponding genetic model from Z; when the absolute value of Z is not large, here we choose the cutoff value of 1, indicating that the additive model is highly plausible, we perform ADD; when there is moderate evidence supporting one extreme model, we eliminate the opposite model and evaluate the significance based on the more significant one of the two remaining models. For example, when is Z is 1.5, we take the larger value of |ADD| and |REC|, and similar to MAX3, we use the asymptotic correlation corr(ADD, REC) (provided in Section 2.1) and the cumulative density function of a bivariate normal distribution to obtain the resulting p-value. Here, the Z cutoffs of 1 and 2 are chosen so that we have approximately 70% chance of choosing the correct model and 5% chance of choosing one of the two extreme models when the true mode is additive. MSE uses information in mating-type data to choose a plausible model and uses transmission to ultimately test the genetic association. Importantly, because the information in mating is independent of that in transmission, the size of the MSE testing procedure is not inflated relative to the nominal type I error rate.

Fig. 1.

Fig. 1.

The Model-Selection-Elimination (MSE) approach.

2.2.3. When the risk allele is unknown

The sign of E[Z] depends on the sign of β2−2β1. In the previous section, we assumed that the reference allele, allele B in our notation, is the low-risk allele and the test allele, allele A in our notation, is the high-risk allele. This implies that β1≥0 and β2≥0. In this case, β2−2β1>0 holds for a recessive model, while β2−2β1<0 holds for a dominant model. In contrast, when the test allele is a low-risk allele, we have β1≤0 and β2≤0, which leads to E[Z] being positive for a dominant model and E[Z] being negative for a recessive model. A similar behavior was observed with the HWE test in a case–control design (Joo and others, 2010). Thus, when the risk allele is unknown, care must be taken to distinguish between the dominant and recessive model. Here, we use one allele arbitrarily as the reference allele and the other one as the test allele. We first calculate ADD and Z, then replace Z with sign(ADD)Z. This ensures that a positive Z indicates a recessive effect and a negative Z indicates a dominant effect.

2.3. The parental genotype–phenotype likelihood

When phenotypes of parents are available, they may provide useful information regarding the true underlying genetic model. Let r0, r1, and r2 denote the number of affected parents with genotype BB, AB, and AA, respectively, and let s0, s1, and s2 denote the number of unaffected parents with genotype BB, AB, and AA, respectively. The parental genotype–phenotype likelihood is then:

2.3.

Similar to the non-founder likelihood, the parental genotype–phenotype likelihood can be used to construct tests with different assumptions on β1 and β2. One convenient test is the Armitage trend test (Armitage, 1955), which is the score test that is commonly used for case–control association studies (Sasieni, 1997). Let 2DFPGP, ADDPGP, DOMPGP, and RECPGP be the score tests under no restrictions, an additive effect, a dominant effect, and a recessive effect, respectively. Here, the subscript “PGP” stands for “parental genotype–phenotype,” and we consider the signed square-root of the test statistics for ADDPGP, DOMPGP, and RECPGP. When phenotypes of parents are available, we first perform the above-mentioned four tests and then use the results to determine which test will be performed for the non-founder likelihood. We propose three strategies: (i) model-selection (MS); (ii) weighted test (WT); and (iii) model-elimination (ME). Since both mating-type and parental genotype–phenotype data contain information for mode of inheritance, we propose a method, called MSEM-PGP, to combine the information in these data. The details of these methods can be found in Section C of the Supplementary material.

2.4. Generalization to nuclear families

The proposed methods can also be applied to nuclear families containing more than one affected or unaffected child per family by using the following testing algorithm. As before, we first use either mating-type data or parental genotype–phenotype data to infer the genetic model. Note that the mating-type data provide useful information of the genetic model as long as the families are ascertained through a diseased offspring and parental genotype–phenotype data provide information of the genetic model regardless of ascertainment. We then perform family-based association tests by assuming different genetic models by appropriately coding the genotypes to numerical values or vectors (Horvath and others, 2001). As in Rabinowitz and Laird (2000), missing data in parents are addressed by conditioning on the sufficient statistics. Finally, we use the information of genetic models inferred from parental data to combine family-based association tests under different genetic models.

3. Simulations and simulation results

3.1. Methods of simulations

In most simulations, we assume that 1000 case–parents trios are sampled from a random mating population. Three disease models are considered: additive, dominant, and recessive. The frequencies of the risk allele are chosen at 0.1, 0.2, and 0.3. Two p-value cutoffs are considered in this article: 0.05 and 1.0E−7. A cutoff of 0.05 corresponds to the significance level used for an individual test, and 1.0E−7 corresponds to the significance level used for a GWAS. Noting that the information in the phenotypes of parents depends on the prevalence of the disease in the general population, we consider prevalence values of 5, 10, and 15%. For each combination of parameters, we choose GRRs such that the power of the optimal test is close to 0.9. The GRR parameters are provided in the footnotes of Table 13 and Table S2 in supplementary material available at Biostatistics online. In each scenario, power is empirically estimated using 1000 simulated datasets.

Table 1.

The performance of model selection/elimination when the p-value cutoff is 0.05 and the frequency of the risk allele is 0.1

Mating type
Parental phenotypes: prevalence = 5%
Parental phenotypes: prevalence = 10%
Parental phenotypes: prevalence = 15%
Selected
Eliminated
Most likely
Least likely
Most likely
Least likely
Most likely
Least likely
True model D R D R A D R A D R A D R A D R A D R A D R
Add 0.02 0.02 0.16 0.13 0.35 0.32 0.23 0.06 0.29 0.65 0.49 0.34 0.15 0.02 0.21 0.77 0.62 0.28 0.09 0 0.15 0.85
Dom 0.06 0 0.05 0.28 0.25 0.44 0.18 0.11 0.16 0.72 0.31 0.56 0.08 0.05 0.09 0.86 0.33 0.62 0.03 0.02 0.05 0.93
Rec§ 0 0.58 0.28 0 0.11 0.09 0.64 0.19 0.58 0.23 0.08 0.03 0.83 0.16 0.74 0.09 0.05 0.01 0.92 0.13 0.83 0.04

Inline graphic; Inline graphic; Inline graphic.

Table 2.

The performance of model selection/elimination when the p-value cutoff is 0.05 and the frequency of the risk allele is 0.2

Mating type
Parental phenotypes: prevalence=5%
Parental phenotypes: prevalence=10%
Parental phenotypes: prevalence=15%
Selected
Eliminated
Most likely
Least likely
Most likely
Least likely
Most likely
Least likely
True model D R D R A D R A D R A D R A D R A D R A D R
Add 0.02 0.02 0.14 0.12 0.34 0.26 0.25 0.07 0.36 0.57 0.50 0.28 0.18 0.02 0.32 0.66 0.59 0.25 0.14 0.01 0.29 0.70
Dom 0.11 0 0.03 0.34 0.24 0.47 0.14 0.13 0.19 0.68 0.27 0.60 0.08 0.07 0.10 0.83 0.26 0.69 0.03 0.02 0.06 0.92
Rec§ 0 0.42 0.34 0 0.13 0.11 0.56 0.16 0.59 0.25 0.12 0.06 0.75 0.14 0.74 0.12 0.10 0.02 0.85 0.11 0.83 0.06

Inline graphic; Inline graphic; Inline graphic.

Table 3.

The performance of model selection/elimination when the p-value cutoff is 0.05 and the frequency of the risk allele is 0.3

Mating type
Parental phenotypes: prevalence=5%
Parental phenotypes: prevalence=10%
Parental phenotypes: prevalence=15%
Selected
Eliminated
Most likely
Least likely
Most likely
Least likely
Most likely
Least likely
True model D R D R A D R A D R A D R A D R A D R A D R
Add 0.02 0.02 0.13 0.14 0.34 0.24 0.27 0.07 0.44 0.48 0.48 0.24 0.22 0.02 0.42 0.55 0.59 0.21 0.19 0.01 0.40 0.60
Dom 0.17 0 0.02 0.34 0.22 0.47 0.15 0.12 0.22 0.65 0.23 0.64 0.08 0.09 0.12 0.79 0.26 0.69 0.04 0.04 0.07 0.89
Rec§ 0 0.31 0.40 0.01 0.16 0.13 0.56 0.15 0.60 0.25 0.17 0.05 0.71 0.12 0.76 0.12 0.13 0.02 0.82 0.07 0.86 0.07

Inline graphic; Inline graphic; Inline graphic.

To study the effect of population stratification, we conduct simulations assuming that trios are sampled from two random mating populations, with the frequencies of the risk allele being 0.1 and 0.2, respectively. In this case, we sample 500 trios from Population 1 and 500 from Population 2.

3.2. Simulation results

3.2.1. Accuracy in selecting or eliminating models

We first examine whether mating-type data and parental genotype–phenotype data provide additional information to facilitate model selection and elimination. A comparison of Table 13 and Table S2 in supplementary material available at Biostatistics online shows that the larger the effect sizes, the larger the probability of selecting the true model and eliminating the wrong model. For example, when the true model is recessive and the frequency of the risk allele is 0.2, the proportion of simulations in which REC was correctly selected based on Z increased from 42% for Inline graphic to 94% for Inline graphic. For parental genotype–phenotype data, the larger the prevalence the higher the chance of selecting the true model and eliminating the wrong model. For example, when the true model is recessive, the frequency of the risk allele is 0.1 and Inline graphic, the proportion of simulations in which REC is chosen as the most likely model using parental genotype–phenotype data increased from 64% for a 5% prevalence to 92% for a 15% prevalence.

In the presence of population stratification, the information in mating-type data or that in parental genotype–phenotype data tends to predict the model to be recessive (Table S5 in supplementary material available at Biostatistics online). This observation implies that violation of HWE leads to biased inference for the mode of inheritance when using mating-type or parental genotype–phenotype data.

3.2.2. Empirical power

Estimated power is presented in Tables 46 and Table S3 in supplementary material available at Biostatistics online. The first three columns yield the empirical power of the tests under the additive, dominant, and recessive model, respectively. It is not surprising that all tests are optimal when the assumed model agrees with the true model, but none of them are robust against model misspecification. Similar to what has been previously reported, we found that MAX3 is more powerful than 2DF in all situations. For large effect sizes (Table S3 in supplementary material available at Biostatistics online) MSE has greater power than MAX3 when the true model is recessive. When the effect sizes are small (Tables 46), MSE generally outperforms MAX3 when the true model is additive or dominant, but is less powerful when the true model is recessive. Since the additive model is probably the most common model (Vukcevic and others, 2011; Hill and others, 2008), we expect that MSE would outperform MAX3 in many commonly encountered scenarios.

Table 4.

Estimated power when the p-value cutoff is 0.05 and the frequency of the risk allele is 0.1

Robust tests
Mating type Parental phenotypes: prevalence=5%
Parental phenotypes: prevalence=10%
Parental phenotypes: prevalence=15%
True model ADD DOM REC 2DF MAX3 MSE MS WT ME MS WT ME MS WT ME
Null 0.06 0.06 0.04 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.05 0.06 0.05 0.06 0.06
Add 0.91 0.88 0.28 0.86 0.87 0.89 0.75 0.84 0.90 0.80 0.86 0.90 0.85 0.87 0.90
Dom 0.87 0.90 0.07 0.84 0.85 0.87 0.74 0.78 0.87 0.82 0.82 0.88 0.88 0.87 0.88
Rec 0.34 0.06 0.92 0.89 0.89 0.82 0.77 0.75 0.65 0.84 0.83 0.76 0.88 0.88 0.80

The values in bold are the power based on the optimal test.

Table 5.

Estimated power when the p-value cutoff is 0.05 and the frequency of the risk allele is 0.2

Robust tests
Mating type Parental phenotypes: prevalence=5%
Parental phenotypes: prevalence=10%
Parental phenotypes: prevalence=15%
True model ADD DOM REC 2DF MAX3 MSE MS WT ME MS WT ME MS WT ME
Null 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.05 0.05 0.05 0.04
Add 0.90 0.82 0.44 0.83 0.86 0.88 0.76 0.86 0.88 0.78 0.86 0.88 0.81 0.87 0.88
Dom 0.86 0.91 0.08 0.85 0.86 0.88 0.77 0.81 0.87 0.83 0.84 0.88 0.88 0.88 0.89
Rec 0.44 0.07 0.90 0.85 0.85 0.77 0.75 0.71 0.68 0.80 0.79 0.76 0.84 0.84 0.80

The values in bold are the power based on the optimal test.

Table 6.

Estimated power when the p-value cutoff is 0.05 and the frequency of the risk allele is 0.3

Robust tests
Mating type Parental phenotypes: prevalence=5%
Parental phenotypes: prevalence=10%
Parental phenotypes: prevalence=15%
True model ADD DOM REC 2DF MAX3 MSE MS WT ME MS WT ME MS WT ME
Null 0.04 0.05 0.05 0.06 0.06 0.04 0.06 0.05 0.06 0.05 0.05 0.04 0.05 0.05 0.05
Add 0.91 0.81 .056 0.85 0.87 0.90 0.78 0.88 0.89 0.80 0.87 0.89 0.82 0.87 0.88
Dom 0.80 0.91 0.08 0.84 0.85 0.85 0.74 0.76 0.84 0.82 0.81 0.86 0.86 0.85 0.87
Rec 0.55 0.06 0.91 0.84 0.86 0.78 0.73 0.71 0.72 0.80 0.79 0.79 0.84 0.84 0.83

The values in bold are the power based on the optimal test.

The performance of the three methods that use information from parental genotype–phenotype data is highly dependent on effect size and prevalence. Compared with MSE and the two previously published methods, the MS and WT approaches have lower power when the prevalence of the disease is not high; ME has satisfactory performance when the true model is additive or dominant, especially when the prevalence is high or when the effect size is large. Among the three methods that use parental genotype–phenotype information, ME (the method that eliminates the least likely model) has the best overall performance, and MS (the method that selects the most likely model) has the poorest overall performance.

As expected, the proposed tests maintain the correct type I error rate in the presence of population stratification (Table 6S in supplementary material available at Biostatistics online). However, the power of the tests when the true underlying model is additive or dominant is reduced. This is not surprising, as we have seen that mating type or parental genotype–phenotype data tend to choose the recessive model in the presence of population stratification.

We also examined the performance of combining information in mating type and parental genotype–phenotype data. The results in Table S7 in supplementary material available at Biostatistics online show that there is either no or very small improvement in MSEM-PGH over MSE. This agrees with our observation (Tables 46 and Table S3 in supplementary material available at Biostatistics online) that methods based on parental genotype–phenotype data usually have no greater power than MSE.

4. A real example

Type 1 diabetes [T1D, MIM 222100] is an auto-immune disease with a world-wide prevalence ranging from 1 to 17 cases per 100 000 subjects. More than 40 loci have been identified for T1D, such as variants in cytotoxic T-lymphocyte antigen-4 (CTLA-4, rs231775), interleukin-2 receptor-α (IL-2RA, rs2104286), and interleukin-7 receptor-α (IL-7RA, rs6897932) (Visscher and others, 2012). In addition to these genes, our past work has also considered variants in the Golgi N-glycosylation enzymes MGAT1 and MGAT5. We recently reported that the CTLA-4, IL2RA, and IL7RA variants alter N-glycosylation and interact with a haplotype of MGAT1 (MGAT1 IVAVT-T) to regulate T-cell function and risk of multiple sclerosis (MS), an autoimmune disease of the central nervous system (Mkhikian and others, 2011). Deficiency of MGAT5 in mice leads to T-cell hyper-activity and spontaneous autoimmunity (Demetriou and others, 2001; Lee and others, 2007), while two linked intronic human MGAT5 polymorphisms (rs4953911 and rs3814022) were the top hits in a GWAS for MS severity (Brynedal and others, 2010). We genotyped seven SNPs at the five genes (two at MGAT1, one at CTLA4, one at IL-2RA, one at IL-7RA, and two at MGAT5) for nuclear families recruited by the Type 1 Diabetes Genetics Consortium, using methods described previously (Mkhikian and others, 2011). Each family enrolled in this study was ascertained through a diseased case and only families with at least two affected children remained in the study. In total, data are available on 9367 subjects from 2395 Caucasian families.

We used the 1429 families that have both parents genotyped to calculate the Z statistics. Their values are shown in Table 7. Using our MSE approach, we conducted the additive test at three SNPs and the recessive test at one SNP. We also eliminated the dominant test at two SNPs and the recessive test at one SNP. Thus, the additive model is the most common model, which agrees with previous findings (Vukcevic and others, 2011; Hill and others, 2008). We used the FBAT software (Laird and others, 2000) to conduct family-based association tests under different genetic models. The p-values of the MSE approach are smaller than those from MAX3, suggesting that incorporating the information in Z does lead to some gain in efficiency. One finding of particular interest is SNP rs3814022 at MGAT5. This SNP is not significant under the additive or the dominant model. Because the resulting Z value is 2.09, our MSE method tested the SNP using the recessive model. The resulting p-value is 1.5E−3, which is significant at level 0.05 after Bonferroni’s correction for multiple comparisons. It is important to point out that the MGAT5 gene appears to also show a recessive pattern in our biological experiments (data unpublished). Although the Z statistic for the other SNP at MGAT5 (rs4953911) is not large enough to choose the recessive model, it does eliminate the dominant model. Thus, our MSE approach not only provides the strongest significance but also likely identifies the true underlying model for rs3814022.

Table 7.

Results of the candidate-gene study of T1D

Mating type data
Parental genotype–phenotype data
SNPs Allele freq. PADD PDOM PREC PMAX3 Z S E PMSE Most Least PWT PME
rs7726005
MGAT1 IV 0.083 (T) 0.541 0.526 0.969 0.783 0.06 Add 0.541 Rec Dom 0.790 0.780
rs2070924
MGAT1 V 0.043 (A) 0.186 0.262 0.252 0.350 1.16 Dom 0.330 Dom Rec 0.205 0.214
rs231775
CTLA4 0.414 (G) 2.4E−5 1.1E−3 1.7E−3 5.8E−5 0.50 Add 2.4E−5 Add Rec 3.0E−5 4.4E−5
rs2104286
IL-2RA 0.226 (C) 5.6E−7 5.2E−6 7.8E−3 8.5E−7 −0.13 Add 5.6E−7 Add Rec 6.6E−7 1.1E−6
rs6897932
IL-7RA 0.251 (T) 6.8E−3 0.011 0.182 0.016 −1.28 Rec 0.010 2df Dom 0.010 0.012
rs3814022
MGAT5 0.276 (G) 0.106 0.865 1.5E−3 3.4E−3 2.09 Rec 1.5E−3 2df Add 0.031 0.106
rs4953911
MGAT5 0.315 (T) 0.071 0.608 5.8E−3 0.014 1.19 Dom 0.011 2df Rec 0.055 0.103

Column 1: the rs# of the SNPs and their corresponding genes. Column 2: the minor alleles and their frequencies. Columns 3–12: p-values based on different methods

Model selected by Z.

Model eliminate by Z.

Among the three SNPs showing additive effects based upon Z, two of them show the strongest evidence with the additive effect. The additive p-values in these two cases were 2.4E−5 for rs231775 (CTLA4) and 5.6E−7 for rs2104286 (IL-2RA). Association for both of these SNPs with T1D has been previously reported (Nistico and others, 1996; WTCCC-Consortium, 2007).

Lastly, we examined how useful the information in parental genotype–phenotype data is for model selection or elimination. The p-values based on the information in parental genotype–phenotype data are usually greater than those based on our MSE approach or those calculated from MAX3, suggesting that the information in parental genotype–phenotype is perhaps less useful than that in parental mating type. This is not surprising, as there are only 137 affected parents among the 3404 parents whose phenotypes are known.

5. Discussion

In this article, we proposed to improve the robustness of association tests for family-based data by inferring the true underlying model from parental data. The test statistic we proposed uses information from parental mating-type data. Because it follows the standard normal distribution asymptotically when the true underlying genetic model is additive, it can be used as a test for the mode of inheritance. Our simulations indicate that the proposed statistic provides considerable information in selecting the true model and eliminating the wrong model. As a result, our proposed MSE testing procedure has satisfactory robustness. The comparison of MSE with ADD indicates that MSE is a safe replacement of ADD. Although MSE has reduced power relative to 2DF and MAX3 when the true model is recessive and the effect size is moderate, it has improved power over 2DF and MAX3 when the true effect is additive. Given that most markers appear to show the strongest association for the additive effect, we expect that MSE outperforms 2DF and MAX3 in most situations. The application of MSE to a real data set detected a recessive SNP that would not have been identified using the additive test.

The proposed Z test and the HWE test for cases share several similarities but are distinct in many other ways. For example, both of them provide information for the underlying genetic model; both have no power for additive effects; and both assume that the underlying population is a homogenous and random mating population such that HWE holds. One important distinction between the two is that Z is calculated using parental mating type, and hence it is statistically independent of family-based association tests. In the presence of population stratification, the mating-type distribution can be distorted, leading to an excess of spouse pairs with the same genotypes (Sebro and others, 2010; Yasuda, 1968). In this situation, the chance that Z leads to the optimal test is reduced. However, because of the independence between Z and the family-based association test, the decision made based on Z might affect power but will not inflate type I error rates. In practice, SNPs with large HWE deviations are usually screened out before statistical analysis. As a result, the impact due to violation of HWE should be minimal and we anticipate that the proposed method is still valuable in the presence of HWE violation.

Finally, the performance of the methods that incorporate parental genotype–phenotype data depends on the specific strategy, the allele frequency, and the disease prevalence. When the prevalence of a target disease is large, all three strategies work well because there is sufficient information in the parental genotype–phenotype data to glean the true underlying genetic model. In our candidate-gene study of T1D, the number of T1D cases in parents was small. Thus, there was no substantial advantage of using parental genotype–phenotype data for this example. We proposed a method (MSEM-PGP) to combine the mating-type and parental genotype–phenotype data. However, doing so does not provide improvements over methods based on mating-type data only. One likely explanation for this observation is that detecting interactions using case–control studies requires large sample sizes, whereas, in our case, dominant and recessive models can be considered as intra-locus interactions. Further investigation is needed to search for strategies that can better use all information available in family data.

An R program that implements our methods is available upon request.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Funding

The research was supported in part by grant R01HG004960 from the National Human Genome Research Institute to Z.Y., grant R01AI082266 from the National Institute of Allergy and Infectious Diseases to M.D., and grant P30CA062203 from the National Cancer Institute to D.G.

Supplementary Material

Supplementary Data

Acknowledgments

The authors thank the two anonymous reviewers for their constructive comments. They also thank the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) for access to the T1DGC DNA samples. Conflict of Interest: None declared.

References

  1. Armitage P. Tests for linear trends in proportions and frequencies. Biometrics. 1955;11:375–386. [Google Scholar]
  2. Bhangale T. R., Rieder M. J., Nickerson D. A. Estimating coverage and power for genetic association studies using near-complete variation data. Nature Genetics. 2008;40:841–843. doi: 10.1038/ng.180. [DOI] [PubMed] [Google Scholar]
  3. Brynedal B., Wojcik J., Esposito F., Debailleul V., Yaouanq J., Martinelli-Boneschi F., Edan G., Comi G., Hillert J., Abderrahim H. MGAT5 alters the severity of multiple sclerosis. The Journal of Neuroimmunology. 2010;220:120–124. doi: 10.1016/j.jneuroim.2010.01.003. [DOI] [PubMed] [Google Scholar]
  4. Chen Z. X., Ng H. K. T. A robust method for testing association in genome-wide association studies. Human Heredity. 2012;73:26–34. doi: 10.1159/000334719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Clayton D. A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission. The American Journal of Human Genetics. 1999;65:1170–1177. doi: 10.1086/302577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Demetriou M., Granovsky M., Quaggin S., Dennis J. W. Negative regulation of T-cell activation and autoimmunity by Mgat5 N-glycosylation. Nature. 2001;409:733–739. doi: 10.1038/35055582. [DOI] [PubMed] [Google Scholar]
  7. Feng T., Zhang S. L., Sha Q. Y. Two-stage association tests for genome-wide association studies based on family data with arbitrary family structure. European Journal of Human Genetics. 2007;15:1169–1175. doi: 10.1038/sj.ejhg.5201902. [DOI] [PubMed] [Google Scholar]
  8. Freidlin B., Zheng G., Li Z. H., Gastwirth J. L. Trend tests for case–control studies of genetic markers: power, sample size and robustness. Human Heredity. 2002;53:146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]
  9. Gastwirth J. L., Freidlin B. On power and efficiency robust linkage tests for affected sibs. Annals of Human Genetics. 2000;64:443–453. doi: 10.1046/j.1469-1809.2000.6450443.x. [DOI] [PubMed] [Google Scholar]
  10. Gauderman W. J., Thomas D. C., Murcray C. E., Conti D., Li D. L., Lewinger J. P. Efficient genome-wide association testing of gene–environment interaction in case–parent trios. American Journal of Epidemiology. 2010;172:116–122. doi: 10.1093/aje/kwq097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Gonzalez J. R., Carrasco J. L., Dudbridge F., Armengol L., Estivill X., Moreno V. Maximizing association statistics over genetic models. Genetic Epidemiology. 2008;32:246–254. doi: 10.1002/gepi.20299. [DOI] [PubMed] [Google Scholar]
  12. Hill W. G., Goddard M. E., Visscher P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genetics. 2008;4:e1000008. doi: 10.1371/journal.pgen.1000008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Horvath S., Xu X., Laird N. M. The family based association test method: strategies for studying general genotype–phenotype associations. European Journal of Human Genetics. 2001;9:301–306. doi: 10.1038/sj.ejhg.5200625. [DOI] [PubMed] [Google Scholar]
  14. Ionita-Laza I., McQueen M. B., Laird N. M., Lange C. Genomewide weighted hypothesis testing in family-based association studies, with an application to a 100K scan. The American Journal of Human Genetics. 2007;81:607–614. doi: 10.1086/519748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Joo J., Kwak M. J., Zheng G. Improving power for testing genetic association in case-control studies by reducing the alternative space. Biometrics. 2010;66:266–276. doi: 10.1111/j.1541-0420.2009.01241.x. [DOI] [PubMed] [Google Scholar]
  16. Laird N. M., Horvath S., Xu X. Implementing a unified approach to family-based tests of association. Genetic Epidemiology. 2000;19:36. doi: 10.1002/1098-2272(2000)19:1+<::AID-GEPI6>3.0.CO;2-M. [DOI] [PubMed] [Google Scholar]
  17. Laird N. M., Lange C. The role of family-based designs in genome-wide association studies. Statistical Science. 2009;24:388–397. [Google Scholar]
  18. Lange C., DeMeo D., Silverman E. K., Weiss S. T., Laird N. M. Using the noninformative families in family-based association tests: a powerful new testing strategy. The American Journal of Human Genetics. 2003;73:801–811. doi: 10.1086/378591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lee J. H., Tate C. M., You J. S., Skalnik D. G. Identification and characterization of the human Set1B histone H3-Lys(4) methyltransferase complex. Journal of Biological Chemistry. 2007;282:13419–13428. doi: 10.1074/jbc.M609809200. [DOI] [PubMed] [Google Scholar]
  20. Lettre G., Lange C., Hirschhorn J. N. Genetic model testing and statistical power in population-based association studies of quantitative traits. Genetic Epidemiology. 2007;31:358–362. doi: 10.1002/gepi.20217. [DOI] [PubMed] [Google Scholar]
  21. Li Q. Z., Zheng G., Liang X. Y., Yu K. Robust tests for single-marker analysis in case–control genetic association studies. Annals of Human Genetics. 2009;73:245–252. doi: 10.1111/j.1469-1809.2009.00506.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Luss R., Rosset S., Shahar M. Efficient regularized isotonic regression with application to gene–gene interaction search. Annals of Applied Statsitics. 2012;6:253–283. [Google Scholar]
  23. Mkhikian H., Grigorian A., Li C. F., Chen H. L., Newton B., Zhou R. W., Beeton C., Torossian S., Tatarian G. G., Lee S. U. and others Genetics and the environment converge to dysregulate N-glycosylation in multiple sclerosis. Nature Communications. 2011;2:334. doi: 10.1038/ncomms1333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Murphy A., Weiss S. T., Lange C. Screening and replication using the same data set: testing strategies for family-based studies in which all probands are affected. PLoS Genetics. 2008;4:e1000197. doi: 10.1371/journal.pgen.1000197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Naylor M. G., Weiss S. T., Lange C. A Bayesian approach to genetic association studies with family-based designs. Genetic Epidemiology. 2010;34:569–574. doi: 10.1002/gepi.20513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Nistico L., Buzzetti R., Pritchard L. E., VanderAuwera B., Giovannini C., Bosi E., Larrad M. T. M., Rios M. S., Chow C. C., Cockram C. S., Jacobs K. and others The CTLA-4 gene region of chromosome 2q33 is linked to, and associated with, type 1 diabetes. Human Molecular Genetics. 1996;5:1075–1080. doi: 10.1093/hmg/5.7.1075. [DOI] [PubMed] [Google Scholar]
  27. Purcell S., Sham P., Daly M. J. Parental phenotypes in family-based association analysis. The American Journal of Human Genetics. 2005;76:249–259. doi: 10.1086/427886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Qin H. Z., Feng T., Zhang S. L., Sha Q. Y. A data-driven weighting scheme for family-based genome-wide association studies. European Journal of Human Genetics. 2010;18:596–603. doi: 10.1038/ejhg.2009.201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Rabinowitz D., Laird N. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Human Heredity. 2000;50:211–223. doi: 10.1159/000022918. [DOI] [PubMed] [Google Scholar]
  30. Rao C. R. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Proceedings of the Cambridge Philosophical Society. 1948;44:50–57. [Google Scholar]
  31. Sasieni P. D. From genotypes to genes: doubling the sample size. Biometrics. 1997;53:1253–1261. [PubMed] [Google Scholar]
  32. Schaid D. J., Sommer S. S. Genotype relative risks: methods for design and analysis of candidate-gene association studies. The American Journal of Human Genetics. 1993;53:1114–1126. [PMC free article] [PubMed] [Google Scholar]
  33. Sebro R., Hoffman T. J., Lange C., Rogus J. J., Risch N. J. Testing for non-random mating: evidence for ancestry-related assortative mating in the Framingham Heart Study. Genetic Epidemiology. 2010;34:674–679. doi: 10.1002/gepi.20528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Self S. G., Longton G., Kopecky K. J., Liang K. Y. On estimating HLA/disease association with application to a study of aplastic anemia. Biometrics. 1991;47:53–61. [PubMed] [Google Scholar]
  35. Sham P. C., Cherny S. S., Purcell S., Hewitt J. K. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. The American Journal of Human Genetics. 2000;66:1616–1630. doi: 10.1086/302891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Song K. J., Elston R. C. A powerful method of combining measures of association and Hardy–Weinberg disequilibrium for fine-mapping in case–control studies. Statistics in Medicine. 2006;25:105–126. doi: 10.1002/sim.2350. [DOI] [PubMed] [Google Scholar]
  37. Spielman R. S., McGinnis R. E., Ewens W. J. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM) The American Journal of Human Genetics. 1993;52:506–516. [PMC free article] [PubMed] [Google Scholar]
  38. Troendle J. F., Yu K. F., Mills J. L. Testing for genetic association with constrained models using triads. Annals of Human Genetics. 2009;73:225–230. doi: 10.1111/j.1469-1809.2008.00494.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Van Steen K., McQueen M. B., Herbert A., Raby B., Lyon H., DeMeo D. L., Murphy A., Su J., Datta S., Rosenow C. Genomic screening and replication using the same data set in family-based association testing. Nature Genetics. 2005;37:683–691. doi: 10.1038/ng1582. [DOI] [PubMed] [Google Scholar]
  40. Visscher P. M., Brown M. A., McCarthy M. I., Yang J. Five years of GWAS discovery. The American Journal of Human Genetics. 2012;90:7–24. doi: 10.1016/j.ajhg.2011.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Vukcevic D., Hechter E., Spencer C., Donnelly P. Disease model distortion in association studies. Genetic Epidemiology. 2011;35:278–290. doi: 10.1002/gepi.20576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Wang K., Sheffield V. C. A constrained-likelihood approach to marker-trait association studies. The American Journal of Human Genetics. 2005;77:768–780. doi: 10.1086/497434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Whittemore A. S., Tu I. P. Detection of disease genes by use of family data. I. Likelihood-based theory. The American Journal of Human Genetics. 2000;66:1328–1340. doi: 10.1086/302851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Wittke-Thompson J. K., Pluzhnikov A., Cox N. J. Rational inferences about departures from Hardy–Weinberg equilibrium. The American Journal of Human Genetics. 2005;76:967–986. doi: 10.1086/430507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. WTCCC-Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Xu J., Yuan A., Zheng G. Bayes factor based on the trend test incorporating Hardy–Weinberg disequilibrium: more power to detect genetic association. Annals of Human Genetics. 2012;76:301–311. doi: 10.1111/j.1469-1809.2012.00714.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Xu X., Rakovski C., Xu X. P., Laird N. An efficient family-based association test using multiple markers. Genetic Epidemiology. 2006;30:620–626. doi: 10.1002/gepi.20174. [DOI] [PubMed] [Google Scholar]
  48. Yasuda N. An extension of Wahlund’s principle to evaluate mating ty frequency. The American Journal of Human Genetics. 1968;20:1–23. [PMC free article] [PubMed] [Google Scholar]
  49. Zheng G., Freidlin B., Gastwirth J. L. Robust TDT-type candidate-gene association tests. Annals of Human Genetics. 2002;66:145–155. doi: 10.1017/S0003480002001045. [DOI] [PubMed] [Google Scholar]
  50. Zheng G., Joo J., Zaykin D., Wu C., Geller N. Robust tests in genome-wide scans under incomplete linkage disequilibrium. Statistical Science. 2009;24:503–516. doi: 10.1214/09-sts314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Zheng G., Ng H. K. T. Genetic model selection in two-phase analysis for case–control association studies. Biostatistics. 2008;9:391–399. doi: 10.1093/biostatistics/kxm039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Zheng G., Song K., Elston R. C. Adaptive two-stage analysis of genetic association in case–control designs. Human Heredity. 2007;63:175–186. doi: 10.1159/000099830. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data
supp_kxs048_kxs048supp.docx (153.9KB, docx)

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES