Unbiased and Locally Efficient Estimation of Genetic Effect on Quantitative Trait in the Presence of Population Admixture

Yuanjia Wang; Qiong Yang; Daniel Rabinowitz

doi:10.1111/j.1541-0420.2010.01454.x

. Author manuscript; available in PMC: 2012 Jun 1.

Published in final edited form as: Biometrics. 2010 Jun 16;67(2):331–343. doi: 10.1111/j.1541-0420.2010.01454.x

Unbiased and Locally Efficient Estimation of Genetic Effect on Quantitative Trait in the Presence of Population Admixture

Yuanjia Wang ^1,^✉, Qiong Yang ², Daniel Rabinowitz ³

PMCID: PMC2948587 NIHMSID: NIHMS205250 PMID: 20560930

Abstract

Population admixture can be a confounding factor in genetic association studies. Family-based methods (Rabinowitz and Laird 2000) have been proposed in both testing and estimation settings to adjust for this confounding, especially in case-only association studies. The family-based methods rely on conditioning on the observed parental genotypes or on the minimal sufficient statistic for the genetic model under the null hypothesis. In some cases these methods do not capture all the available information due to the conditioning strategy being too stringent. General efficient methods to adjust for population admixture that use all the available information have been proposed (Rabinowitz 2002). However these approaches may not be easy to implement in some situations. A previously developed easy-to-compute approach adjusts for admixture by adding supplemental covariates to linear models (Yang et al. 2000). Here is shown that this augmenting linear model with appropriate covariates strategy can be combined with the general efficient methods in Rabinowitz (2000) to provide computationally tractable and locally efficient adjustment. After deriving the optimal covariates, the adjusted analysis can be carried out using standard statistical software packages such as SAS or R. The proposed methods enjoy a local efficiency in a neighborhood of the true model. The simulation studies show that non-trivial efficiency gains can be obtained by using information not accessible to the methods that rely on conditioning on the minimal sufficient statistics. The approaches are illustrated through an analysis of the influence of apolipoprotein E (APOE) genotype on plasma low density lipoprotein (LDL) concentration in children.

Keywords: Genetic association study, Family-based study, Population stratification

1 Introduction

Association studies are used to locate genetic locus influencing a trait of interest by evaluating association between allelic variability at a candidate locus and a trait. These association-based methods are especially useful for evaluating genetic factors with small to moderate effects (Risch and Merikangas 1996). However, it is well known that population admixture can be a confounding factor for association-based methods (see for example, Elston 1998). When the study sample consists subjects drawn from subpopulations with different allele frequencies and trait distributions, spurious association may be detected even when the trait and the gene are not biologically linked.

Family-based association studies have been proposed to adjust for population admixture. The transmission disequilibrium test (TDT) and its extensions (Falk and Rubinstein 1987; Terwilliger and Ott 1992; Spielman, McGinnis, and Ewens 1993; Spielman and Ewens 1998; Lazzeroni and Lange 1998) examine the transmission of parental alleles to the offsprings in a family given the parental genotypes. By conditioning on the parental genotypes, the bias due to population admixture is avoided. However when the parental genotypes are not all observed, these tests cannot be applied. Applying TDT tests to restricted data sets of families with complete parental genotypes can result in bias (Curtis and Sham 1995). Rabinowitz and Laird (2000) proposed a general approach to adjust for population admixture by comparing the test statistics to their conditional distributions given the minimal sufficient statistics for the genetic model under the null hypothesis which does not require complete parental genotype information.

When the association between a trait and a locus has been established by testing, it may be desirable to estimate the form and strength of the trait-genotype association and to evaluate the interaction between genotypes and other environmental factors. Yang et al. (2000) proposed an approach to estimate candidate gene effect in a linear model that is not affected by spurious association. In this approach, the paradigm of conditioning on minimal sufficient statistics is achieved through augmenting the standard regression models with appropriate additional covariates. The approach is computationally convenient: after obtaining additional covariates, the analysis can be carried out by standard statistical packages such as SAS or R.

The approaches proposed in Rabinowitz and Laird (2000) for testing and in Yang et al. (2000) for estimation do not capture all of the available information (the minimal sufficient statistic is not always complete). Rabinowitz (2002) proposed a general framework to develop efficient test statistic that exploits all of the available information that is not potentially confounded by population stratification. Whittemore (2004) proposed a general framework for efficient estimation functions which protects against population stratification. Allen et al. (2005) developed locally efficient estimation of haplotype-disease association in case-parent trio designs that is robust to confounding.

Here is shown that the approach of eliminating bias by adding supplemental covariates to a linear model in Yang et al. (2000) can be combined with the method of deriving efficient estimating equations in Whittemore (2004) or Rabinowitz (2002) to obtain a computationally tractable estimation approach that is efficient but not confounded by population admixture. The optimal supplemental covariates are obtained through matrix algebra calculations. In the cases where the family-specific effects are absent, the additional covariates have closed form expressions. The proposed methods enjoy a local efficiency in a neighborhood of the true model. We use simulation studies to investigate unbiasedness and efficiency of the methods under conditions including violation of assumptions and departure from the true model. The simulation results show that non-trivial efficiency gains can be obtained by using information not accessible to methods that rely on conditioning on the minimal sufficient statistics. The approaches are illustrated through an analysis of the influence of apolipoprotein E (APOE) genotype on plasma low density lipoprotein (LDL) concentration in children.

2 Methods

In this section linear models for quantitative traits are introduced and optimal additional covariates to be included in the models are derived. Two models are considered: one does not involve family-specific terms and the other includes random family-specific effects.

2.1 Model without family-specific effects

Let Y_ij denote a quantitative trait of the j^th individual in the i^th family, and let G_ij denote the genotype of the same individual at a candidate locus. A simple linear model relating Y_ij to G_ij is

Y_{i j} = X (G_{i j}) β + ε_{i j},

(1)

where X(G_ij) is a coding for the genotype, and β is the effect of the genotype on the trait. For example, for a recessive trait, X(G_ij) can take value one for subjects carrying two copies of the disease allele and value zero for subjects carrying zero or one copy of the disease allele. The ε_ij are residual effects other than the genotypes under examination which may be environmental factors that are independent of the genotypes or genetic factors at unlinked loci. When there is no population admixture, ε_ij are independent of G_ij and the usual least square estimate of β is unbiased. However, when there presents population admixture, the membership of subpopulation is part of the residual effects ε_ij. Since the subpopulation membership influences the genotype distribution in the subpopulation, ε_ij and G_ij are correlated so that the ordinary least square estimate of β is biased.

To motivate the derivation of efficient estimator of β when the genotypes and the residual effects are not independent, it is useful to review the method proposed in Yang et al. (2000). The validity of Yang et al. (2000) relies on the fact that even though G_ij and ε_ij in model (1) may be marginally correlated, they are conditionally independent, given the minimal sufficient statistic for the genetic model under the null hypothesis. In Yang et al. (2000), after adding the conditional expectation of X(G_ij) given the minimal sufficient statistics as additional covariates to the model, the least square estimate for β is unbiased even when population admixture is present. In the current work, the form of the optimal additional covariates that leads to efficient estimator of β is unknown, and is derived from the a constrained optimization problem.

Similar to Yang et al. (2000), the key assumption underlies the proposed approaches is that although the genotype-related covariates and the residuals are correlated due to population admixture, they are conditionally independent given the founder genotypes. As noted in Yang et al. (2000), this assumption corresponds to the transmission of parental alleles to the offspring generation being independent to any other factors that influence the trait, given the parental genotypes. While this assumption is surely an approximation of the biological truth, it is the basis for other methods adjusting for population admixture in family-based association studies (for example TDT and FBAT).

Let U_ij denote the unknown additional covariate for the j^th individual from the i^th family, let n denote the total number of families, let n_i denote the number of subjects in the i^th family, and let N denote the total number of subjects. A linear model relating a trait to a genotype with additional covariates is

Y_{i j} = X (G_{i j}) β + U_{i j} γ + ε_{i j} .

Without loss of generality, the above model can be written as

Y_{i j} = (X (G_{i j}) - U_{i j}) β + U_{i j} \tilde{γ} + ε_{i j},

(2)

where γ̃ = −β + γ. In standard multiple regression analysis, the least square estimates of a subset of regression coefficients can be acquired by first regressing the covariates of these coefficients on the covariates of the remaining coefficients, taking the residuals, and then regressing the response variable on the residuals. We apply this observation to model (2), where we are interested in estimating β. When the covariates X(G_ij) − U_ij are uncorrelated with U_ij (in terms of expectation), the residuals of regressing X(G_ij) − U_ij on U_ij are X(G_ij) − U_ij themselves. Therefore the least square estimate of β can then be written as

\hat{β} = {[{(X - U)}^{T} (X - U)]}^{- 1} {(X - U)}^{T} Y,

(3)

where the unknown covariates U satisfy

E {(X - U)}^{T} U = 0.

(4)

Here X_i = (X(G_i₁), ···, X(G_{in_i}))^T, U_i = (U_i₁, ···, U_{in_i})^T, X = (X₁, ···, X_n)^T, U = (U₁, ···, U_n)^T, and Y = (Y₁, ···, Y_n)^T.

Let $G_{i}^{★}$ denote the genotypes of founders in the i^th family. Note that $G_{i}^{★}$ may not be completely observed. An important assumption ensuring the validity of the proposed methods analogous to that in Yang et al. (2000) is that even though G_ij and ε_ij may be marginally dependent, they are conditionally independent given $G_{i}^{★}$ . It is shown in the appendix that under this assumption, the expectation of β̂ in (3) is

β + E {{[{(X - U)}^{T} (X - U)]}^{- 1} \sum_{i} E [{(X_{i} - U_{i})}^{T} 1_{n_{i}} ∣ G_{i}^{★}] E [ε_{i} ∣ G_{i}^{★}]},

(5)

and the variance of β̂ is

σ^{2} E [{(X - U)}^{T} {(X - U)}^{- 1}],

(6)

where 1_{n_i} is the n_i × 1 vector of one. From (5), the condition for obtaining unbiased estimate of β is then

E [{(X_{i} - U_{i})}^{T} 1_{n_{i}} ∣ G_{i}^{★}] = 0, i = 1, \dots, n .

(7)

It follows that the optimal unbiased estimator for β can be obtained by minimizing (6) under the constraints (4) and (7).

It is convenient to introduce some notations to describe the solution to the above constrained maximization problem. Let $ϑ_{i}^{★}$ denote all possible founder genotypes compatible with that observed in the i^th family, and let ϑ_i denote all possible combinations of offspring genotypes in the i^th family. Let c_i denote the dimension of ϑ_i and let d_i denote the dimension of $ϑ_{i}^{★}$ . Let W_i denote the c_i × c_i diagonal matrix with the diagonal entries given by the (possibly misspecified) probabilities of founder genotypes, P(g). In reality, W_i are usually computed from observed founder genotypes. Let Z_i denote the c_i × d_i matrix with the (g,g^★) entry given by the conditional probability of an offspring genotype given the founder genotypes, P(g|g^★). Let X_i denote the n_i × c_i matrix with rows index individuals in a family and columns index components in ϑ_i; that is, the k^th row in the matrix is (X_ik(g₁), ···, X_ik(g_{c_i})). Let V_i denote the n_i × c_i matrix with the (m,g)^th component being U_im(g). The corresponding matrices W_i, Z_i, X_i and V_i for an example pedigree are given in the appendix. With these notations, it is shown in the appendix that the solution to the constrained optimization problem of minimizing (6) subject to (4) and (7) is

V_{i} = X_{i} Z_{i} {(Z_{i}^{T} W_{i}^{- 1} Z_{i})}^{- 1} Z_{i}^{T} W_{i}^{- 1} .

(8)

The additional covariates U_i can be picked from the rows of V_i that correspond to the observed genotypes in members of the ith family. A simple example illustrating the computations is presented in the appendix.

We use family-specific residual sums to estimate the variance of β̂. By the conditions (4) and (7), the estimating equation for β̂,

\sum_{i} {(X_{i} - U_{i})}^{T} [Y_{i} - (X_{i} - U_{i}) β],

has expectation zero. Since the family-specific residual terms are independent, the variance of the solution to this estimating equation is (Cox and Hinkley 1979)

\frac{\sum_{i} {[{(X_{i} - U_{i})}^{T} (Y_{i} - (X_{i} - U_{i}) \hat{β})]}^{2}}{{[\sum_{i} {(X_{i} - U_{i})}^{T} (X_{i} - U_{i})]}^{2}} .

(9)

One advantage of the proposed estimator is that it is unbiased even when the founder genotype distributions W_i are misspecified. To see this, note that the condition for obtaining unbiasedness (7) holds regardless of whether the marginal probabilities W_i are correctly specified. To be specific, denote the misspecified W_i as $W_{i}^{*}$ . As derived in the appendix, the solutions for the corresponding $U_{i}^{*}$ are the components of

X_{i} Z_{i} {(Z_{i}^{T} W_{i}^{* - 1} Z_{i})}^{- 1} Z_{i}^{T} W_{i}^{* - 1} .

By the introduced notations, taking the conditional expectation of a random variable amounts to multiplying it by components of Z_i. Therefore, from

\begin{array}{l} E [{(X_{i} - U_{i}^{*})}^{T} 1_{n_{i}} ∣ G_{i}^{★}] = 1_{n_{i}}^{T} Z_{i}^{T} X_{i}^{T} 1_{n_{i}} - 1_{n_{i}}^{T} Z_{i}^{T} W_{i}^{* - 1} Z_{i} {(Z_{i}^{T} W_{i}^{* - 1} Z_{i})}^{- 1} Z_{i}^{T} X_{i}^{T} 1_{n_{i}} \\ = 1_{n_{i}}^{T} Z_{i}^{T} X_{i}^{T} 1_{n_{i}} - 1_{n_{i}}^{T} Z_{i}^{T} X_{i}^{T} 1_{n_{i}} = 0, \end{array}

it follows that the unbiasedness condition (7) is satisfied with misspecified W_i (denoted as $W_{i}^{*}$ ) and misspecified U_i (denoted as $U_{i}^{*}$ ). Although in this case the adjusted estimator remains unbiased, it is not efficient. We study the magnitude of efficiency loss due to misspecification of W by simulations in section 3.

2.2 Including family-specific effects

In some situations there may exist family-specific effects influencing a trait. The linear model in this case can be expressed as:

Y_{i j} = X (G_{i j}) β + α_{i} + ε_{i j},

(10)

where α_i is a random family-specific factor. Due to population stratification, α_i may not be independent of G_ij, but may be independent of the subject-specific residual effects ε_ij. An assumption ensuring validity of the proposed methods in this case is that even though G_ij and α_i may be marginally correlated, they are conditionally independent given $G_{i}^{★}$ .

When there are family-specific effects, to obtain the optimal estimator, weighted least square

{\hat{β}}^{'} = {[{(X - U)}^{T} \sum^{- 1} (X - U)]}^{- 1} {(X - U)}^{T} \sum^{- 1} Y

(11)

should be used, where Σ is the conditional covariance matrix of Y given G^★. From the appendix, the expectation of β̂′ is

β + E {{[{(X - U)}^{T} \sum^{- 1} (X - U)]}^{- 1} \sum_{i} E [{(X_{i} - U_{i})}^{T} \sum_{i}^{- 1} 1_{n_{i}} ∣ G_{i}^{★}] E [α_{i} ∣ G_{i}^{★}]} + E {{[{(X - U)}^{T} \sum^{- 1} (X - U)]}^{- 1} \sum_{i} E [{(X_{i} - U_{i})}^{T} \sum_{i}^{- 1} 1_{n_{i}} ∣ G_{i}^{★}] E [ε_{i j} ∣ G_{i}^{★}]},

(12)

where Σ_i is the conditional covariance of the observations from the ith family given the founder genotypes, $cov (Y_{i} Y_{i}^{T} ∣ G_{i}^{★})$ . The unbiasedness condition analogous to (7) should be modified as

E {[(X_{i} - U_{i}))}^{T} \sum_{i}^{- 1} 1_{n_{i}} ∣ G_{i}^{★}] = 0, i = 1, \dots, n .

(13)

The constraint analogous to (4) should be modified as

E {(X - U)}^{T} \sum^{- 1} U = 0.

(14)

In addition, we show in the appendix that the variance to be minimized is

E {(X - U)}^{T} \sum^{- 1} (X - U) .

(15)

Now define

X_{i}^{'} = \sum_{i}^{- \frac{1}{2}} X_{i}, U_{i}^{'} = \sum_{i}^{- \frac{1}{2}} U_{i}, Y_{i}^{'} = \sum_{i}^{- \frac{1}{2}} Y_{i}, and 1_{n_{i}}^{'} = \sum_{i}^{- \frac{1}{2}} 1_{n_{i}} .

With these notations, the constraint (13) becomes (7), the constraint (14) becomes (4) and the minimization term (6) becomes (15) with X_i, U_i and 1_{n_i} replaced as $X_{i}^{'}, U_{i}^{'}$ , and $1_{n_{i}}^{'}$ . Consequently, carrying out the same constrained optimization procedure for model (1) using the newly defined variables leads to the solution

V_{i}^{'} = X_{i}^{'} Z_{i} {(Z_{i}^{T} W_{i}^{- 1} Z_{i})}^{- 1} Z_{i}^{T} W_{i}^{- 1} .

(16)

In practice, we can fit a linear mixed effects model that has a random family-specific effect and include the original supplemental covariates without considering the family-specific effects (the β̂ obtained from fitting a linear mixed effects model is estimated by weighted least square).

It is worthy to mention that population admixture behaves as a source of family-specific effects (Fulker et al. 1999; Abecasis et al. 2000). The weighted least square approach therefore provides efficiency gain over the ordinary least square even when there are no additional family-specific effects, α_i. We compare the two approaches by simulations. For the weighted least square, the variance can be estimated by

\frac{\sum_{i} {[{(X_{i} - U_{i})}^{T} {\sum^{^}}_{i}^{- 1} (Y_{i} - (X_{i} - U_{i}) \hat{β})]}^{2}}{{[\sum_{i} {(X_{i} - U_{i})}^{T} {\sum^{^}}_{i}^{- 1} (X_{i} - U_{i})]}^{2}} .

(17)

3 Simulation Studies

In this section, we use extensive simulation studies to evaluate properties of the proposed methods. We examine the effect of misspecifying marginal probabilities in W and the influence of the family-specific effects. We also compare the ordinary least square with the weighted least square, and the proposed methods with Yang et al. (2000).

We generated 100 nuclear families each with two children. To simulate population stratification, we drew parental genotypes from a 50:50 admixture of two populations. Parents in the same family were drawn from the same population. The disease allele frequency in each subpopulation was 0.1 and 0.3, so that the marginal disease allele frequency in the whole population was 0.2. The parental genotypes within each subpopulation were simulated based on the Hardy-Weinberg equilibrium. The offspring genotypes were simulated based on Mendelian transmission probabilities. We assumed a dominant effect of the disease allele. We simulated a linear model with different intercept for each population. To investigate the impact of varying severity of population admixture, we simulated several combinations of the intercepts. The intercept in the first population was 5, while in the second population was 10 or 20. The genetic effect was chosen to be 10, 20, 50 or 100. In the models with random family-specific effects, the variances of these effects were 5, 15 or 25. There were 1000 replications in each set of the simulations. We simulated residuals from a normal distribution with mean zero and standard deviation five.

We first examine performance of the proposed methods under the null hypothesis (β = 0). Table 1 shows the type I error rates of test statistics computed using various methods. When the population admixture is moderate, the type I error rates of the unadjusted ordinary least square and unadjusted weighted least square are clearly much higher than the nominal level. For example, for α = 0.05, the type I error is 0.43 for the former and 0.34 for the latter. In contrast, the the proposed adjusted ordinary and weighted least square methods have maintained the desirable error rates. We also investigate the method in Yang et al. (2000) and find its type I error rate to be close to the nominal level. When the population admixture is more severe, the type I error rates of the two unadjusted analyses are substantially higher than the nominal level while the proposed approach and Yang et al. (2000) have maintained the correct α-level.

Table 1.

Type I error rates of various methods

α level	Unadjusted OLS	Adjusted OLS	Unadjusted WLS	Adjusted WLS	Adjusted Yang^†
μ₁ = 5, μ₂ = 10
0.01	0.208	0.007	0.138	0.008	0.007
0.05	0.432	0.041	0.343	0.05	0.043
0.1	0.57	0.091	0.477	0.099	0.09

μ₁ = 5, μ₂ = 20
0.01	0.83	0.007	0.201	0.012	0.007
0.05	0.952	0.043	0.425	0.053	0.046
0.1	0.975	0.103	0.571	0.105	0.106

Open in a new tab

^†

Adjusted by method in Yang et al. (2000)

Next we examine performance of various methods under the alternative hypothesis (β ≠ 0). The first set of simulations corresponds to model (1) where there are no family-specific effects. Table 2 summarizes results for the ordinary least square and the weighted least square method. We first examined the properties of the proposed estimator when there was no genetic effect, i.e., β = 0. Under this null model, the unadjusted estimator reported a large spurious genetic effect, that is, the mean β̂ = 1.54 for the ordinary least square and the mean β̂ = 1.39 for the weighted least square analysis. In contrast, the adjusted estimators were very close to zero: the mean β̂ was 0.02 (average empirical SE=1.38) for the ordinary least square and the mean β̂ = 0.02 (average empirical SE=1.37) for the weighted least square. Next we examined the estimator where the true genetic effect was greater than zero (β = 10, 20, 50 or 100). It can be seen that the unadjusted estimates had large bias while the adjusted estimates had negligible bias: the bias of the former ranged from 1.48 to 4.53, while for the latter it ranged from zero to 0.07. The magnitude of the bias for the unadjusted methods increased with the severity of population stratification. When the population admixture was moderate (μ₁=5, μ₂=10), the bias of the unadjusted least square estimator was around 1.5. When the population admixture was more severe (μ₁=5, μ₂=20), the bias increased to around 4.5. The bias for the unadjusted weighted least square estimator in each scenario of the population admixture was 1.3 and 2.3, respectively. The magnitude of the bias was similar across all values of the genetic effect for both estimators.

Table 2.

Estimates with correctly specified W: no family-specific effects

Ordinary least square estimates (OLS)
	μ₁=5, μ₂=10				μ₁ =5, μ₂ =20

True β	Unadjusted Mean β̂	Adjusted Mean β̂	Empirical S.E.	Estimated S.E.	Unadjusted Mean	Adjusted Mean β̂	Empirical S.E.	Estimated S.E.
0	1.54	0.02	1.38	1.31	4.48	−0.02	1.89	1.88
10	11.48	9.98	1.32	1.31	14.53	9.98	1.90	1.87
20	21.53	20.01	1.35	1.31	24.48	19.93	1.88	1.87
50	51.51	50.00	1.33	1.31	54.50	49.99	1.89	1.89
100	101.51	100.05	1.34	1.31	104.43	100.00	1.89	1.87

Weighted least square estimates (WLS)
0	1.39	0.02	1.37	1.32	2.27	0.02	1.45	1.47
10	11.32	9.98	1.32	1.32	12.27	10.02	1.49	1.47
20	21.31	20.01	1.33	1.32	22.26	19.95	1.48	1.47
50	51.35	49.99	1.33	1.32	52.30	50.02	1.48	1.46
100	101.38	100.05	1.32	1.32	102.22	100.05	1.47	1.46

Comparing to Yang et al. (2000)
True β	Yang Mean β̂	Empirical S.E.	Eff. gain OLS^†	Eff. gain WLS^‡	Yang Mean β̂	Empirical S.E.	Eff. gain OLS^†	Eff. gain WLS^‡

10	10.32	2.20	40%	40%	10.88	2.91	35%	49%
20	20.33	2.25	40%	41%	20.85	2.88	35%	49%
50	50.3	2.25	41%	41%	50.78	2.93	35%	49%
100	100.2	2.20	39%	40%	100.84	2.83	33%	48%

Open in a new tab

^†

[SE(Yang)-SE(OLS)]/SE(Yang);

^‡

[SE(Yang)-SE(WLS)]/SE(Yang)

Note that when the population admixture was moderate, the standard errors of the ordinary and weighted least square estimators were similar. However, when the population admixture was more substantial (μ₁=5, μ₂=20), the weighted least square method was more efficient even when there were no family-specific effects. This is because population admixture acts as a source of family-specific effects (Fulker et al. 1999; Abecasis et al. 2000) in which case the weighted least square is more efficient. The efficiency gains of the weighted least square increased with the severity of admixture. When the difference between the intercepts of the two populations was 15, the reduction of the empirical standard error of the estimator was up to 24%. The estimated standard errors were close to the empirical ones.

We compare the efficiency of the proposed methods with Yang et al. (2000), i.e., adding conditional expectations of the genotypes given the minimal sufficient statistics of the null model as additional covariates in the linear model. We see from the bottom panel of Table 2 that the efficiency gains of the proposed methods ranged from 33% to 49%, which were non-trivial. Note that the efficiency gains of the ordinary least square versus the weighted least square were similar when the admixture was moderate (the left panel in Table 2). When the admixture was more severe (the right panel in Table 2), the efficiency gains increased from about 35% in the ordinary least square to about 49% in the weighted least square.

The second set of simulations corresponds to model (10) where there are random family-specific effects. The variance of these effects was fifteen. Again we examined the estimator both under a null model (β = 0) and under several alternative models (β > 0). The same phenomenon of the unadjusted estimators reporting spurious genetic effect when the true β was zero while the adjusted estimators were very close to zero was also observed for this set of simulations. From Table 3, we also see that both the ordinary least square and the weighted least square methods provided unbiased estimates. As expected, the weighted least square estimates were more efficient. When the admixture was moderate, using weighted least square instead of ordinary least square reduced the empirical standard error by up to 10%. When the population admixture was more severe, the corresponding reduction was up to 29%. We noticed larger efficiency gains of the proposed methods over Yang et al. (2000) in this set of simulations (the bottom panel of Table 3). The efficiency gains ranged from 40% to 60%, which are again substantial.

Table 3.

Estimates with correctly specified W: with family-specific effects, var(α_i) = 15

Ordinary least square estimates (OLS)
	μ₁=5, μ₂=10				μ₁ =5, μ₂ =20

True β	Unadjusted Mean β̂	Adjusted Mean β̂	Empirical S.E.	Estimated S.E.	Unadjusted Mean	Adjusted Mean β̂	Empirical S.E.	Estimated S.E.
0	1.53	−0.05	1.57	1.51	4.51	0.02	2.02	2.01
10	11.54	10.03	1.53	1.53	14.53	9.92	2.04	2.01
20	21.44	19.95	1.51	1.52	24.45	20.02	2.04	2.02
50	51.50	49.99	1.51	1.52	54.50	49.92	2.05	2.02
100	101.49	99.91	1.54	1.52	104.49	99.98	2.09	2.02

Weighted least square estimates (WLS)
0	1.07	−0.04	1.42	1.40	1.96	−0.03	1.42	1.48
10	11.05	10.04	1.41	1.41	11.96	9.96	1.53	1.49
20	21.07	20.0	1.36	1.40	22.04	20.02	1.45	1.48
50	51.07	50.0	1.36	1.40	51.96	49.95	1.50	1.48
100	101.1	99.9	1.40	1.41	102.02	100.08	1.50	1.48

Comparing to Yang et al. (2000)
True β	Yang Mean β̂	Empirical S.E.	Eff. gain OLS^†	Eff. gain WLS^‡	Yang Mean β̂	Empirical S.E.	Eff. gain OLS^†	Eff. gain WLS^‡

10	10.50	3.01	49%	53%	10.81	3.59	43%	57%
20	20.23	2.99	49%	55%	20.78	3.65	44%	60%
50	50.24	3.02	50%	55%	50.80	3.61	43%	58%
100	100.23	3.09	50%	55%	100.80	3.49	40%	57%

Open in a new tab

^†

[SE(Yang)-SE(OLS)]/SE(Yang);

^‡

[SE(Yang)-SE(WLS)]/SE(Yang)

In the third set of simulations, we investigate the unbiasedness of β̂ when W_i is misspecified. We analyzed simulated data with correctly specified allele frequency (0.2), moderately misspecified frequency (0.4) and substantially misspecified frequency (0.9). Tables 3 and 4 summarize results under different severity of admixture for models (1) and (10). We see that the unbiasedness holds with misspecified W. The mean bias of both methods ranged from 0.01 to 0.3. There appears to be a small sample bias for the proposed methods when the allele frequency was severely misspecified as 0.4 and 0.9. However, the bias went away when we increased the sample size to 200 families. Specifically, for the ordinary least square, the mean bias decreased from approximately 0.2 (when p=0.4) to 0.04 and from approximately 0.3 (when p=0.9) to 0.05. For the weighted least square, the mean bias decreased from approximately 0.1 (when p=0.4) to 0.02 and from approximately 0.1 (when p=0.9) to 0.03.

Table 4.

Unbiasedness under misspecification of W: no family-specific effects, μ₁ = 5, μ₂ = 20

Ordinary least square estimates
	p=0.2			p=0.4			p=0.9

True β	Adjusted Mean β̂	Empirical S.E.	Estimated S.E.	Adjusted Mean β̂	Empirical S.E.	Estimated S.E.	Adjusted Mean β̂	Empirical S.E.	Estimated S.E.
10	9.94	1.89	1.88	9.73	1.95	1.86	9.80	1.93	1.83
20	19.97	1.94	1.87	19.68	1.90	1.87	19.81	1.94	1.85
50	49.90	1.92	1.87	49.75	1.94	1.88	49.74	1.91	1.84
100	99.99	1.91	1.87	99.70	1.91	1.87	99.70	1.9	1.82

Weighted least square estimates
10	9.94	1.51	1.47	9.93	1.52	1.48	9.94	1.55	1.51
20	20.00	1.49	1.47	49.90	1.51	1.48	49.88	1.53	1.51
50	49.98	1.48	1.47	49.90	1.51	1.48	49.9	1.56	1.5
100	99.93	1.47	1.47	99.90	1.51	1.48	99.90	1.47	1.51

Open in a new tab

In the fourth set of simulations, we investigate efficiency loss due to misspecification of W. In Tables 5 and 6, we present the empirical standard errors of the point estimates with different scenarios of misspecification. For model (1) where there are no family-specific effects, when the allele frequency was moderately misspecified (p=0.4), the efficiency loss ranged from 1% to 5%, which was moderate. The efficiency loss for the ordinary least square and weighted least square was similar. When the allele frequency was severely misspecified (p=0.9), the efficiency loss ranged from 7% to 13 %. For model (10) where there are family-specific effects, the efficiency loss ranged from 0% to 6%. The magnitudes of the loss were comparable for the moderately and severely misspecified allele frequency.

Table 5.

Unbiasedness under misspecification of W: with family-specific effects, μ₁ = 5, μ₂ = 20, var(α_i) = 25

Ordinary least square estimates
	p=0.2			p=0.4			p=0.9

True β	Adjusted Mean β̂	Empirical S.E.	Estimated S.E.	Adjusted Mean β̂	Empirical S.E.	Estimated S.E.	Adjusted Mean β̂	Empirical S.E.	Estimated S.E.
10	9.95	2.20	2.10	9.80	2.10	2.10	9.70	2.10	2.00
20	19.98	2.15	2.11	19.80	2.20	2.10	19.70	2.05	2.00
50	50.00	2.18	2.11	49.80	2.15	2.10	49.80	2.14	2.02
100	99.95	2.20	2.10	99.70	2.18	2.10	99.70	2.10	2.00

Weighted least square estimates
10	10.05	1.56	1.50	9.97	1.52	1.51	9.99	1.52	1.51
20	19.99	1.50	1.48	20.01	1.49	1.50	19.90	1.49	1.52
50	50.02	1.53	1.49	50.02	1.50	1.49	49.90	1.50	1.52
100	99.95	1.52	1.49	99.90	1.53	1..50	99.90	1.48	1.51

Open in a new tab

Table 6.

Efficiency loss due to misspecification of W: no family-specific effects, μ₁ = μ₂ = 5

Ordinary least square estimates
	p=0.2		p=0.4			p=0.9

True β	Adjusted Mean β̂	Empirical S.E.	Adjusted Mean β̂	Empirical S.E.	Efficiency Loss	Adjusted Mean β̂	Empirical S.E.	Efficiency loss
10	10.01	1.25	10.04	1.30	4%	10.05	1.34	7%
20	20.01	1.26	19.97	1.27	1%	19.97	1.35	8%
50	50.01	1.27	49.99	1.30	2%	49.99	1.36	7%
100	99.99	1.27	100.03	1.28	1%	100.02	1.40	10%

Weighted least square estimates
10	9.90	1.25	10.04	1.31	5%	10.05	1.35	8%
20	20.01	1.24	19.95	1.27	2%	19.99	1.35	8%
50	50.02	1.24	49.99	1.30	5%	49.99	1.37	10%
100	99.99	1.25	100.02	1.29	3%	100.00	1.41	13%

Open in a new tab

In the fifth set of simulations, we examine the efficiency loss due to adjusting for population admixture when it is in fact absent. We considered both the cases when the family-specific effects were present and absent. Tables 7 and 8 summarize results under these two scenarios. When there was no admixture and the family-specific effects were absent, the efficiency loss of the adjusted analysis compared to the unadjusted ranged from 34% to 38%. The magnitude of loss was similar for the ordinary and weighted least squares. When there were family-specific effects, the efficiency loss was around 28%. The efficiency loss of the ordinary and the weighted least square was also similar.

Table 7.

Efficiency loss due to misspecification of W: with family-specific effects, μ₁ = μ₂ = 5, var(α_i) = 25

Ordinary least square estimates
	p=0.2		p=0.4			p=0.9

True β	Adjusted Mean β̂	Empirical S.E.	Adjusted Mean β̂	Empirical S.E.	Efficiency Loss	Adjusted Mean β̂	Empirical S.E.	Efficiency loss
10	10.02	1.61	9.99	1.67	4%	10.00	1.64	2%
20	19.96	1.59	19.99	1.66	4%	20.05	1.69	6%
50	50.03	1.63	49.99	1.66	2%	49.94	1.67	2%
100	99.99	1.60	99.95	1.60	0%	100.00	1.64	2%

Weighted least square estimates
10	10.03	1.42	9.99	1.46	3%	10.00	1.49	5%
20	19.98	1.43	20.02	1.49	4%	20.05	1.45	1%
50	50.00	1.43	50.01	1.50	5%	49.96	1.51	6%
100	99.98	1.43	99.95	1.44	1%	100.00	1.50	5%

Open in a new tab

Table 8.

Efficiency loss due to adjusting for admixture when no admixture is present: no family-specific effects, μ₁ = μ₂ = 5

Ordinary least square estimates
True β	Unadjusted		Adjusted		Efficiency Loss^†
True β	Mean β̂	Emp. S.E.	Mean β̂	Emp. S.E.	Efficiency Loss^†
10	10.00	0.73	9.99	1.11	34%
20	19.95	0.79	19.95	1.19	34%
50	49.99	0.77	50.02	1.23	37%
100	100.01	0.71	100.03	1.15	38%

Weighted least square estimates
10	10.00	0.73	9.99	1.11	34%
20	19.95	0.79	19.95	1.19	34%
50	49.99	0.78	50.02	1.23	37%
100	100.01	0.71	100.03	1.15	38%

Open in a new tab

^†

[SE (Adjusted)−SE (Unadjusted)]/SE (Adjusted)

In the sixth set of simulations, we investigate the influence of departure from constant family-specific effects model. In Table 10, we compare the empirical standard errors of β̂ under varying values of the variance of the family-specific effects. The variance of α_i in each model was 0, 15 or 25. For the ordinary least square method, the standard error of β̂ increased with increasing variance of the family-specific effects, and the loss of efficiency ranged from 8% to 17%. For the weighted least square method, the standard errors of the estimators were similar regardless of the value of the family-specific variance (efficiency loss up to 3%). In other words, the family-specific effects have little influence on the efficiency of the estimates obtained by the weighted least square.

Table 10.

Effect of departure from constant family-specific effects model (μ₁ = 5, μ₂ = 20)

Ordinary least square estimates
	Var(α_i)=0		Var(α_i)=15			Var(α_i)=25

True β	Adjusted Mean β̂	Empirical S.E.	Adjusted Mean β̂	Empirical S.E.	Increase of S.E.	Adjusted Mean β̂	Empirical S.E.	Increase of S.E.
10	9.98	1.90	9.87	2.13	12%	9.95	2.22	17%
20	19.93	1.88	19.83	2.03	8%	19.98	2.15	14%
50	49.99	1.89	49.90	2.10	11%	50.00	2.18	15%
100	100.00	1.89	99.97	2.11	12%	99.95	2.20	16%

Weighted least square estimates
10	10.02	1.49	9.93	1.51	1%	10.05	1.49	0%
20	19.95	1.48	19.87	1.51	2%	19.99	1.50	1%
50	50.02	1.48	49.93	1.50	1%	50.02	1.49	1%
100	100.05	1.47	99.99	1.48	1%	99.95	1.52	3%

Open in a new tab

4 Data Analysis

In this section, the proposed approaches are applied to an analysis of the influence of APOE genotype on plasma LDL concentrations in young children. There are three common alleles at the APOE locus (ε2, ε3, ε4). The apo ε3 is the most prevalent allele in the general population, with a frequency of 75% to 80%. The frequency of apo ε4 allele varies with ethnicity (Howard et al. 1998). Previous studies of adults have shown that apo ε4 allele is associated with higher LDL cholesterol levels compared to ε3 (Davignon et al. 1988), while the role of ε2 is more complicated.

The effect of APOE gene was found to be larger in younger people (Hixson et al. 1991). Children included in this data analysis were recruited through the Columbia University BioMarkers Study, a cross-sectional study of children and their parents conducted from 1994 to 1998 (Isasi et al. 2000, Shea et al. 1999). Families were recruited from lists of cardiac patients generated through the Presbyterian Hospital Clinical Information System, private cardiology practices, lipid clinics, pediatric practices at Columbia-Presbyterian Medical Center, and fliers posted within the medical center. Families with at least one healthy child, 4 to 25 years of age, were eligible for participation. Healthy was defined as not having any chronic medical condition under treatment by a pediatrician, other than high blood pressure or high lipids (referral criteria to the Childrens Cardiovascular Health Center). Some subjects were recruited through family members other than the children. Around 75% of the children were Hispanic and the remaining 25% were non-Hispanic White.

There were 621 children recruited for the study, among whom 55 did not have data on the APOE genotype. Among the children with genotype data, 10 were excluded because of Mendelian genotyping errors. There were 13 children without LDL concentration data. These children contributed to the computation of the additional covariates, but were not included in the subsequent regression analysis relating LDL to genotype because of missing LDL levels. The mean LDL of the 534 children was 103.8 and the standard deviation was 43.5. The frequencies of children with APOE genotypes ε2ε2, ε3ε2, ε3ε3, ε4ε2, ε4ε3, and ε4ε4 were 9.1%, 7.9%, 61.8%, 1.8%, 25.6%, and 2.0%. The mean LDL concentrations for children in each genotype group were 49.8, 84.9, 106.1, 107.2, 105.4, and 105.9.

Among the 547 children from 322 families with genotype data, 153 in 78 families had complete parental genotypes, 389 in 241 families had parental genotype available in one of the parents, and 5 of them in 3 families had no genotype information on any of the parents.

Three sets of analyses were presented. First, the unadjusted analysis was carried out. Then the analysis was repeated with two approaches to adjust for admixture: the first was the methods proposed in Yang et al. (2000), which computed the additional covariates as conditional expectation of the founder genotypes given the minimal sufficient statistics of the genetic model under the null; the second was the methods proposed here, which computed the additional covariate by (8) or (16). Results from the three analyses were compared.

In each set of the analyses, models with and without a family-specific random effect were fit to the data. The ordinary least square estimates for model (1) were obtained by fitting a simple linear model, while weighted least square estimates for model (10) were obtained by fitting a mixed effects model with random family-specific effects. Standard errors were computed as in (9) or (17). The APOE genotype was coded as the number of each of the three APOE alleles carried by a subject. In this data analysis example, Y_ij is the LDL concentration for the j^th child in the i^th family, X(G_ij) = (X^ε²(G_ij), X^ε³(G_ij), X^ε⁴(G_ij))^T are the numbers of ε2, ε3, and ε4 allele carried by the child, β = (β₂, β₃, β₄)^T are the effect of each of the three alleles, and α_i is the family-specific random effect.

Results of the unadjusted analysis were summarized in Table 11. The significant contrasts were β₃ − β₂ and β₄ − β₂. The estimated differences were 19.9 (SE: 5.3) and 20.7 (SE: 6.1). The interpretation for this analysis was that children carrying the apo ε2 allele had a significantly lower LDL concentration than children with the ε3 or ε4 allele.

Table 11.

Real data example: the unadjusted analysis

Ordinary least square estimates

Parameter

Estimate

Standard error

p value

ε2

33.0

5.2

< 0.001

ε3

52.9

1.2

< 0.001

ε4

53.7

3.2

< 0.001

ε3 − ε2

19.9

5.4

< 0.001

ε4 − ε3

0.8

3.7

0.83

ε4 − ε2

20.7

6.2

< 0.001

Weighted least square estimates

ε2

33.9

6.3

< 0.001

ε3

52.7

1.3

< 0.001

ε4

56.6

2.8

< 0.001

var (α_{i}^{2})

721.5

145.4

< 0.001

var (σ_{i}^{2})

1159.7

115.1

< 0.001

ε3 − ε2

18.8

6.5

< 0.001

ε4 − ε3

3.9

3.4

0.24

ε4 − ε2

22.7

6.7

< 0.001

Open in a new tab

For the first adjusted analysis, the FBAT (Rabinowitz and Laird 2000, Horvath et al. 2001) was used to compute the conditional expectation of X(G_i) given the minimal sufficient statistics of parental genotypes. Then a linear model was fit using these conditional expectations as additional covariates as in Yang et al. (2000). The results were summarized in Table 12. In this analysis the parameter estimates for β were not identifiable, but the contrasts remained identifiable. The significant contrasts were still β₃ − β₂ and β₄ − β₂ as in the unadjusted analysis. The values of the contrasts were 24.5 (SE: 11.1) and 31.7 (SE: 11.0) for the ordinary least square, and 22.0 (SE: 9.8) and 30.1 (SE: 10.1) for the weighted least square. Note that when using the weighted least square method, the effect for β₃ − β₂ changed from 18.8 (unadjusted) to 22.0 (adjusted), and the effect for β₄ − β₂ changed from 22.7 (unadjusted) to 30.1 (adjusted). Similar magnitude of increase was observed for the ordinary least square estimates.

Table 12.

Real data example: adjusting by Yang et al. (2000)

Ordinary least square estimates

Parameter

Estimate

Standard error

p value

ε3 − ε2

24.5

11.1

0.03

ε4 − ε3

7.2

4.7

0.12

ε4 − ε2

31.7

11.0

0.004

Weighted least square estimates

var (α_{i}^{2})

718.8

146.1

< 0.001

var (σ_{i}^{2})

1165.0

116.0

< 0.001

ε3 − ε2

22.0

9.8

0.03

ε4 − ε3

8.2

4.3

0.06

ε4 − ε2

30.1

10.1

0.003

Open in a new tab

For the second adjusted analysis using the methods developed here, the matrices W_i and Z_i are required. The genotype frequencies in W_i were computed using observed founder genotypes. We have shown that misspecification of W_i does not affect the unbiasedness of β. To illustrate the computation of Z_i and the additional covariates, the calculation was carried out for an example pedigree with two children and one observed heterozygous parent in the appendix.

The results of these analyses were summarized in Table 13. The significant estimated contrasts were β₃ − β₂ and β₄ − β₂, with values 18.6 (SE: 6.9) and 19.9 (SE: 6.9) for the ordinary least square, and 17.6 (SE: 7.3) and 22.0 (7.3) for the weighted least square. The changes of the contrasts in the proposed adjustment were smaller compared to the Yang adjustment, and the standard errors were also smaller. These comparisons suggest that applying Yang et al. (2000) may have over-corrected for population admixture. Furthermore, the larger standard errors for the contrasts in the Yang analysis compared to the proposed reflected the loss of information by conditioning on the minimal sufficient statistics in the Yang analysis. We can also see that the weighted least square estimates had smaller standard error than the ordinary least squares estimates.

Table 13.

Real data example: adjusting by the proposed method

Ordinary least square estimates

Parameter

Estimate

Standard error

p value

ε3 − ε2

18.6

6.9

0.007

ε4 − ε3

1.3

3.6

0.72

ε4 − ε2

19.9

6.9

0.004

Weighted least square estimates

var (α_{i}^{2})

734.7

146.6

< 0.001

var (σ_{i}^{2})

1157.2

114.6

< 0.001

ε3 − ε2

17.6

7.3

0.02

ε4 − ε3

4.4

3.4

0.20

ε4 − ε2

22.0

7.3

0.003

Open in a new tab

5 Discussion

Here a locally efficient approach to adjusting for population admixture when estimating genetic effect on a quantitative trait is proposed. The main step is to augment a linear regression model with supplemental covariates which provides unbiased minimal variance estimator for the genetic parameter of interest. The form of the additional covariates is similar in spirit to that proposed in Rabinowitz (2002) and Whittemore (2004). The models in (1) and (10) can be extended to include environmental factors.

In the testing context, it was observed by Whittemore and Halpern (2003) that both Rabniowitz and Laird (2000) and Rabinowitz (2002) can be formulated as solutions to a constrained optimization problem: the coefficient of variation of the test statistic is to be maximized under some constraint. In Rabinowitz and Laird (2000) the constraint is that the conditional expectation of the test statistic given the minimal sufficient statistic of the genetic model under the null hypothesis is zero; while in Rabinowitz (2002) the constraint is that the conditional expectation of the test statistic given the founder genotypes is zero. The column space of the constraints in the former contains the corresponding column space of the constraints of the latter: the vectors in the latter space can be expressed as a linear combination of vectors in the former space. This observation implies that the constraints in Rabinowtiz and Laird (2000) are too restrictive and there is potential loss of information incurred by conditioning on a larger space. For example, families where all children have the same genotype do not contribute to the analysis. In contrast, the methods in Rabinowtiz (2002) capture all the available information.

In the estimation context, the analogous comparison of efficiency is between Yang et al. (2000), which was based on Rabinowitz and Laird (2000), and the proposed approach, which has a similar form to Rabinowitz (2002) and Whittemore (2004). Yang et al. (2000) corresponds to projecting covariates involving genotypes subject to population admixture onto the space of the minimal sufficient statistics, while the proposed approach corresponds to projecting the genotype-related covariates onto an appropriate smaller space. The larger the space of projection, the more information is lost. Our simulation results suggest non-trivial efficiency gains of the proposed methods over Yang et al. (2000). It can also be seen from the real data analysis example that the standard errors of the proposed methods were smaller which suggests over-adjustment in Yang et al. (2000) because conditioning on the minimal sufficient statistic may be too restrictive.

It is shown in the appendix that the proposed approach is optimal when there is no family-specific effect or when the family-specific effect is a constant. When such effect is not a constant, the proposed approach is locally optimal because the family-specific effect is approximately a constant (the variance of such effect is zero) when considered locally. We used simulation studies to investigate the efficiency of the estimators when the family-specific effect is not a constant. The efficiency of the ordinary least square method was reduced to up to 17% (Table 10), but the efficiency of the weighted least square method was not greatly influenced by the departure from the constant family-specific effect model (efficiency loss up to 3%, see Table 10).

The added covariates U_i involve marginal probabilities W_i, which are estimated (possibly incorrectly) from the data. The estimators depending on these estimated values would normally introduce extra variability. However, since the additional covariates X_i − U_i can be viewed as residuals from the projection of X_i onto to the space spanned by W_i, they are orthogonal to W_i. By the orthogonality, there is no additional variability introduced by estimating W_i.

Since population admixture acts as a source of family-specific effects, the weighted least square is more efficient than the ordinary least square even when there are no additional family-specific effects. For the real data analysis, the weighted least square should be used.

The proposed methods are designed for the single-locus model or the multi-locus model without interaction. When there are multiple loci predisposing a disease and there is no interaction between the loci, we compute a set of optimal covariates for each locus using founder genotypes at this locus. All the optimal covariates will then be included in the linear model analyses. When there is interaction between the loci, the current approaches need to be modified to a haplotype based method to account for this effect because haplotype association analysis may be more powerful than genotype association analysis (Morris and Kaplan 2002). However, one complication faced by a haplotype analysis is that the phase of a haplotype is usually not observed. In a haplotype analysis, when the phase is known, one uses functions of haplotypes as predictors. When the phase is unknown, one uses the conditional distribution of the haplotypes given the genotypes to compute the conditional expectation of phase-unknown haplotype scores. The conditional distribution depends on the marginal distribution of haplotypes which may be subject to population stratification or may be estimated using only approximated assumptions (e.g. Hardy-Weinberg equilibrium). To extend the methods developed here in this context, the covariates X_i in the equation (8) should be replaced by the estimated conditional expectation of the functions of haplotypes. The W_i should be replaced by the marginal distribution used in the calculations of X_i, and Z_i should be replaced by the matrix of conditional probability of all possible haplotypes given the parental haplotypes. Further research along this direction is underway.

The proposed methods are easy to implement and the computational cost is low. The extra computation involved other than fitting a linear model is to compute the additional optimal covariates. The form of these covariates (see (8)) suggests that they are constructed using marginal distribution of founder genotypes and conditional distribution of the offspring genotypes and involve some matrix algebra. These computations do not entail iterations and can be completed in seconds. In our simulations, it took half a minute to compute the optimal covariates for 1000 repetitions on a Dell Workstation with 2.00GHz CPU. A link to the code to compute the optimal covariates can be found at www.columbia.edu/~yw2016.

Here the methods are developed in the context of random sampling. When subjects with extreme values of a quantitative trait are over-sampled or when certain outcomes are over-sampled, these methods are generally biased. In such settings, the more general estimating equation conditioning on the outcomes proposed in Whittemore (2004) may be applicable.

Table 9.

Efficiency loss due to adjusting for admixture when no admixture is present: with family-specific effects, μ₁ = μ₂ = 5, var(α_i) = 15

Ordinary least square estimates
Trueβ	Unadjusted		Adjusted		Efficiency Loss
Trueβ	Mean β̂	Emp. S.E.	Mean β̂	Emp. S.E.	Efficiency Loss
10	9.99	1.01	9.97	1.41	28%
20	20.01	1.02	19.98	1.42	28%
50	50.02	1.01	50.02	1.39	27%
100	99.98	1.02	99.98	1.41	28%

Weighted least square estimates
10	9.99	0.97	9.97	1.34	28%
20	19.99	0.97	19.98	1.32	27%
50	50.01	0.95	50.02	1.31	27%
100	99.99	0.95	99.98	1.32	28%

Open in a new tab

^†

[SE (Adjusted)−SE (Unadjusted)]/SE (Adjusted)

Acknowledgments

Yuanjia Wang’s research is supported by NIH AG031113-01A2. We thank Dr. Steve Shea for providing the LDL data used to illustrate the proposed methods.

Appendix

In this section the expectation and the variance of β̂ were computed, the solution to the constrained optimization problem was derived, and the efficient additional covariates for an example pedigree was computed.

From the expression β̂ = [(X − U)^T(X − U)]⁻¹(X − U)^TY and the model (2), the expectation of β̂ is

\begin{array}{l} E \hat{β} = E {{[{(X - U)}^{T} (X - U)]}^{- 1} {(X - U)}^{T} Y} \\ = β + E {{[{(X - U)}^{T} (X - U)]}^{- 1} {(X - U)}^{T} U \tilde{γ}} + E {{[{(X - U)}^{T} (X - U)]}^{- 1} {(X - U)}^{T} ε} \\ = β + E {{[{(X - U)}^{T} (X - U)]}^{- 1} \sum_{i} E [{(X_{i} - U_{i})}^{T} 1_{n_{i}} ∣ G_{i}^{★}] E [ε_{i j} ∣ G_{i}^{★}]}, \end{array}

where 1_{n_i} denotes the n_i × 1 vector of 1. Here the third equality follows from the constraint (4) and the fact that given the founder genotypes, X(G) is conditionally independent of ε. It follows that the condition to ensure the unbiasedness of β̂ is that

E [{(X_{i} - U_{i})}^{T} 1_{n_{i}} ∣ G_{i}^{★}] = 0, i = 1, \dots, n .

Now turn to the computation of the variance. We have the expression

var (\hat{β}) = var (E (\hat{β} ∣ G^{★})) + E (var (\hat{β} ∣ G^{★})) .

Under the conditions (4) and (7), we have E(β̂|G^★) = β. Therefore the first term on the right hand side of the expression is zero. Let A_i denote the vector X(G_i) − U_i, and let A denote the matrix ${(A_{1}^{T}, \dots, A_{n}^{T})}^{T}$ . The second term can be calculated as

\begin{array}{l} var (\hat{β} ∣ G^{★}) = E {{[A^{T} A]}^{- 1} \sum_{i} A_{i}^{T} ε_{i} ε_{i}^{T} A_{i} {[A^{T} A]}^{- 1} ∣ G^{★}} \\ = \sum_{i} E (ε_{i j}^{2} ∣ G_{i}^{★}) E {{[A^{T} A]}^{- 1} A_{i}^{T} A_{i} {[A^{T} A]}^{- 1} ∣ G^{★}} \\ = σ^{2} E {{[A^{T} A]}^{- 1} ∣ G^{★}} . \end{array}

Here σ² is the variance of the residuals. The second equality follows from the conditional independence of ε_ij and G_ij given $G_{i}^{★}$ . Taking the expectation we obtain

var (\hat{β}) = σ^{2} E {[A^{T} A]}^{- 1} .

(18)

Similarly, when there are family-specific effects, expectation of the weighted least square β̂′ as in (11) is

\begin{array}{l} E {\hat{β}}^{'} = E {{[{(X - U)}^{T} \sum^{- 1} (X - U)]}^{- 1} {(X - U)}^{T} \sum^{- 1} Y} \\ = E {{[{(X - U)}^{T} \sum^{- 1} (X - U)]}^{- 1} {(X - U)}^{T} \sum^{- 1} [(X - U) β + U \tilde{γ} + α + ε]} \\ = β + E {{[{(X - U)}^{T} \sum^{- 1} (X - U)]}^{- 1} \sum_{i} E [{(X_{i} - U_{i})}^{T} \sum_{i}^{- 1} 1_{n_{i}} ∣ G_{i}^{★}] E [α_{i} ∣ G_{i}^{★}]} + E {{[{(X - U)}^{T} \sum^{- 1} (X - U)]}^{- 1} \sum_{i} E [{(X_{i} - U_{i})}^{T} \sum_{i}^{- 1} 1_{n_{i}} ∣ G_{i}^{★}] E [ε_{i j} ∣ G_{i}^{★}]} . \end{array}

The conditional variance is

\begin{array}{l} var ({\hat{β}}^{'} ∣ G^{★}) = E {{[A^{T} \sum^{- 1} A]}^{- 1} \sum_{i} A_{i}^{T} \sum_{i}^{- 1} (α_{i}^{2} 1_{n_{i}} 1_{n_{i}}^{T} + ε_{i} ε_{i}^{T}) \sum_{i}^{- 1} A_{i} {[A^{T} \sum^{- 1} A]}^{- 1} ∣ G^{★}} \\ = E {{[A^{T} \sum^{- 1} A]}^{- 1} \sum_{i} A_{i}^{T} \sum_{i}^{- 1} E [α_{i}^{2} 1_{n_{i}} 1_{n_{i}}^{T} + ε_{i} ε_{i}^{T} ∣ G_{i}^{★})] \sum_{i}^{- 1} A_{i} {[A^{T} \sum^{- 1} A]}^{- 1} ∣ G^{★}} \\ = \sum_{i} E {{[A^{T} \sum^{- 1} A]}^{- 1} A_{i}^{T} \sum_{i}^{- 1} A_{i} {[A^{T} \sum^{- 1} A]}^{- 1} ∣ G^{★}} \\ = E {{[A^{T} \sum^{- 1} A]}^{- 1} ∣ G^{★}} . \end{array}

Here the second equality follows from the conditional independence of α_i and G_ij given $G_{i}^{★}$ , and the third equality follows from $\sum_{i} = var (Y_{i} Y_{i}^{T} ∣ G_{i}^{★})$ . Taking expectation, we have

var ({\hat{β}}^{'}) = E {[A^{T} \sum^{- 1} A]}^{- 1} .

(19)

Therefore minimizing the variance of weighted least square when there are family-specific effects amounts to minimizing (X − U)^TΣ⁻¹(X − U).

Since by a linear transformation, solving U for the weighted least square estimator can be converted to solving U for the ordinary least square, we solve the constrained optimization problem for the latter. To minimize the variance (18) subject to constraints (4) and (7), we introduce Lagrange equations. Recall that $ϑ_{i}^{★}$ denotes all possible founder genotypes compatible with that observed in the i^th family, and ϑ_i denotes all possible combination of offspring genotypes in the i^th family. The object function is

\sum_{i = 1}^{n} \sum_{g \in ϑ_{i}} \sum_{j = 1}^{n_{i}} {(X_{i j} (g) - U_{i j} (g))}^{2} P (g) - \sum_{i = 1}^{n} \sum_{g \in ϑ_{i}} \sum_{g^{★} \in ϑ_{i}^{★}} \sum_{j = 1}^{n_{i}} λ_{i, g^{★}} (X_{i j} (g) - U_{i j} (g)) P (g ∣ g^{★}) - η \sum_{i = 1}^{n} \sum_{g \in ϑ_{i}} \sum_{j = 1}^{n_{i}} (X_{i j} (g) - U_{i j} (g)) U_{i j} (g) P (g) .

Here λ_i,g^★ and η are Lagrange multipliers. The term U_ij(g) is the additional covariate to add when the observed genotypes in the offspring is g.

It is convenient to write the objective function in a matrix form and do the calculation in matrix algebra. Recall the notations for W_i, X_i and V_i defined in section 2, and let λ_i denote the d_i × 1 vector of λ_i,g^★. The objective function can be written as

\sum_{i} 1_{n_{i}}^{T} (X_{i} - V_{i}) W_{i} {(X_{i} - V_{i})}^{T} 1_{n_{i}} - \sum_{i} λ_{i}^{T} Z_{i}^{T} {(X_{i} - V_{i})}^{T} 1_{n_{i}} - η \sum_{i} 1_{n_{i}}^{T} (X_{i} - V_{i}) W_{i} V_{i}^{T} 1_{n_{i}},

and the Lagrange equations can be written as

2 W_{i} {(X_{i} - V_{i})}^{T} 1_{n_{i}} - Z_{i} λ_{i} - η W_{i} {(X_{i} - 2 V_{i})}^{T} 1_{n_{i}} = 0

(20)

Z_{i}^{T} {(X_{i} - V_{i})}^{T} 1_{n_{i}} = 0

(21)

\sum_{i} 1_{n_{i}}^{T} (X_{i} - V_{i}) W_{i} V_{i}^{T} 1_{n_{i}} = 0 .

(22)

Multiplying the both sides of (20) on the left by $Z_{i}^{T} W_{i}^{- 1}$ and using the condition (21) results in

λ_{i} = η {(Z_{i}^{T} W_{i}^{- 1} Z_{i})}^{- 1} Z_{i}^{T} X_{i}^{T} 1_{n_{i}} .

(23)

Plug (23) into (20) to get

(2 - η) W_{i} {(X_{i} - V_{i})}^{T} 1_{n_{i}} - η Z_{i} {(Z_{i}^{T} W_{i}^{- 1} Z_{i})}^{- 1} Z_{i}^{T} X_{i}^{T} 1_{n_{i}} + η W_{i} V_{i}^{T} 1_{n_{i}} = 0.

Solve this equation to arrive at

V_{i}^{T} 1_{n_{i}} = W_{i}^{- 1} Z_{i} {(Z_{i}^{T} W_{i}^{- 1} Z_{i})}^{- 1} Z_{i}^{T} X_{i}^{T} 1_{n_{i}},

(24)

and η = 2. Here V_i is identifiable up to the sum of its components. Adding and subtracting a constant from the elements of V_i does not change the sum of all the elements. Nevertheless, the expectation of the estimator is the same. We simply pick

V_{i} = X_{i} Z_{i} {(Z_{i}^{T} W_{i}^{- 1} Z_{i})}^{- 1} Z_{i}^{T} W_{i}^{- 1} .

The desirable additional covariates U_i can be picked from the row of V_i which corresponds to the observed genotypes in family members of the i^th family. The marginal probabilities W_i can be computed as $P (g) = \sum_{g^{*} \in ϑ_{i} (g^{* *})} P (g ∣ g^{*}) P (g^{*} ∣ g^{* *})$ . Here g** index the observed founder genotypes in ϑ_i. Note that the solution under the constraints (20) and (21) satisfies (22) automatically.

Finally the computation for an illustrative example pedigree with two children is presented. Suppose that the example pedigree has two children with genotypes (DD, Dd), one parent with observed genotype Dd, and the other parent with no genotype information. The parental genotypes compatible with the observed genotypes are $ϑ_{i}^{★} = {(D d, D D), (D d, D d), (D D, d d)}$ . The 9 possible genotype configurations for the children are listed as the rows in Table A1. Here c_i = 9, and d_i = 3. The entries of matrix Z_i are presented in Table A1.

Table A1.

The entries of Z_i for the example pedigree

Offspring genotypes	Parental genotypes
Offspring genotypes	(Dd, DD)	(Dd, Dd)	(Dd, dd)
(DD, DD)	1/4	1/16	0
(DD, Dd)	1/4	1/8	0
(DD, dd)	0	1/16	0
(Dd, DD)	1/4	1/8	0
(Dd, Dd)	1/4	1/4	1/4
(Dd, dd)	0	1/8	1/4
(dd, DD)	0	1/16	0
(dd, Dd)	0	1/8	1/4
(dd, dd)	0	1/16	1/4

Open in a new tab

Code X_i as the number of D alleles, then from the equation (24), the matrix V_i can be calculated. The results under different assumptions of the allele frequency are recorded in Table A2. It can be seen that there is no big difference in V_i when we change the allele frequency.

Table A2.

The matrix X_i and V_i for the example pedigree

Offspring genotypes

X_{i}^{T}

V_{i}^{T}

p = 0.1

p = 0.2

p = 0.4

(DD, DD)

(2, 2)

(2.04, 2.04)

(1.98, 1.98)

(1.86, 1.86)

(DD, Dd)

(2, 1)

(1.68, 1.68)

(1.66, 1.66)

(1.62, 1.62)

(DD, dd)

(2, 0)

(1.24, 1.24)

(1.18, 1.18)

(1.06, 1.06)

(Dd, DD)

(1, 2)

(1.68, 1.68)

(1.66, 1.66)

(1.62, 1.62)

(Dd, Dd)

(1, 1)

(0.6, 0.6)

(0.7, 0.7)

(0.9, 0.9)

(Dd, dd)

(1, 0)

(0.48, 0.48)

(0.46, 0.46)

(0.42, 0.42)

(dd, DD)

(0, 2)

(1.24, 1.24)

(1.18, 1.18)

(1.06, 1.06)

(dd, Dd)

(0, 1)

(0.48, 0.48)

(0.46, 0.46)

(0.42, 0.42)

(dd, dd)

(0, 0)

(0.44, 0.44)

(0.38, 0.38)

(0.26, 0.26)

Open in a new tab

The observed genotypes for children in the example pedigree is (DD, Dd), which correspond to the second entry in Table A2. Therefore the additional covariate U_i for this family when the allele frequency is 0.1 is (1.68, 1.68). When the allele frequency is 0.2 or 0.4, the additional covariates for this family are (1.66, 1.66) and (1.62, 1.62), respectively. These covariates are not substantially affected by the specification of allele frequency.

Contributor Information

Yuanjia Wang, Email: yw2016@columbia.edu, Department of Biostatistics, Mailman School of Public Health, Columbia University 722 W168th St., New York, NY 10032, U.S.A.

Qiong Yang, Department of Biostatistics, Boston University.

Daniel Rabinowitz, Department of Statistics, Columbia University.

References

Abecasis GR, Cardon LR, Cookson WOC. A general test of association for quantitative traits in nuclear families. American Journal of Human Genetics. 2000;66:279–292. doi: 10.1086/302698. [DOI] [PMC free article] [PubMed] [Google Scholar]
Allen AS, Satten GA, Tsiatis AA. Locally-efficient robust estimation of haplotype-disease association in family-based studies. Biometrika. 2005;92:559–571. [Google Scholar]
Cox DR, Hinkley CV. Theoretical statistics. London: Chapman & Hall; 1979. [Google Scholar]
Curtis D, Sham PC. A note on the application of the transmission disequilibrium test when a parent is missing. American Journal of Human Genetics. 1995;56:811–812. [PMC free article] [PubMed] [Google Scholar]
Davignon J, Gregg RE, Sing CF. Apolipoprotein E polymorphism and atherosclerosis. Arteriosclerosis. 1988;8:121. doi: 10.1161/01.atv.8.1.1. [DOI] [PubMed] [Google Scholar]
Elston RC. Linkage and association. Genet Epidemiol. 1998:565–576. doi: 10.1002/(SICI)1098-2272(1998)15:6<565::AID-GEPI2>3.0.CO;2-J. [DOI] [PubMed] [Google Scholar]
Falk CT, Rubinstein P. Haplotype relative risks: An easy reliable way to construct a proper control sample for risk calculations. Ann Hum Genet. 1987;51:227233. doi: 10.1111/j.1469-1809.1987.tb00875.x. [DOI] [PubMed] [Google Scholar]
Fulker DW, Cherny SS, Sham PC, Hewitt JK. Combined linkage and association sib-air analysis for quantitative traits. American Journal ofHuman Genetics. 1999;64:259–267. doi: 10.1086/302193. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hardin JW, Hilbe JM. Generalized Estimating Equations. Chapman and Hall; London, UK: 2003. [Google Scholar]
Hixson JE. Apolipoprotein E Polymorphisms Affect Atherosclerosis in Young Males: Pathobiological Determinants of Atherosclerosis in Youth (PDAY) Research Group. Arterioscler Thromb. 1991;11:237–244. doi: 10.1161/01.atv.11.5.1237. [DOI] [PubMed] [Google Scholar]
Horvath S, Xu X, Laird N. The family based association test method: strategies for studying general genotype-phenotype associations. Euro J Hum Gen. 2001;9:301–306. doi: 10.1038/sj.ejhg.5200625. [DOI] [PubMed] [Google Scholar]
Howard BV, Gidding SS, Liu K. Association of apolipoprotein E phenotype with plasma lipoproteins in African-American and white young adults. Am J Epidemiol. 1998;148:859868. doi: 10.1093/oxfordjournals.aje.a009711. [DOI] [PubMed] [Google Scholar]
Isasi CR, Shea S, Deckelbaum RJ, Couch SC, Starc TJ, Otvos JD, Berglund L. Apolipoprotein ε2 allele is associated with an anti-atherogenic Lipoprotein profile in children: the Columbia University Biomarker Study. Pediatrics. 2000;106:568–575. doi: 10.1542/peds.106.3.568. [DOI] [PubMed] [Google Scholar]
Lazzeroni LC, Lange K. A conditional interference framework for extending the transmission/disequilibrium test. Hum Hered. 1998;48:6781. doi: 10.1159/000022784. [DOI] [PubMed] [Google Scholar]
Morris RW, Kaplan NL. On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles. Genetic Epidemiology. 2002;23:221–233. doi: 10.1002/gepi.10200. [DOI] [PubMed] [Google Scholar]
Rabinowitz D, Larid N. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Human Hered. 2000;50:211–223. doi: 10.1159/000022918. [DOI] [PubMed] [Google Scholar]
Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–1517. doi: 10.1126/science.273.5281.1516. [DOI] [PubMed] [Google Scholar]
Rabinowtiz D. Adjusting for population heterogeneity and misspecified haplotype frequencies when testing nonparametric null hypotheses in statistical genetics. Journal of the American Statistical Association. 2002;92:742–758. [Google Scholar]
Shea S, Isasi CR, Couch S, Starc TJ, Tracy RP, Deckelbaum R, Talmud P, Berglund L, Humphries SE. Relations of plasma fibrinogen level in children to measures of obesity, the (G-455->A) mutation in the beta-fibrinogen promoter gene, and family history of ischemic heart disease: the Columbia University BioMarkers Study. Am J Epidemiol. 1999;150(7):737–46. doi: 10.1093/oxfordjournals.aje.a010076. [DOI] [PubMed] [Google Scholar]
Spielman RS, Ewens WJ. A sib-ship test for linkage in the presence of association: The sib transmission/disequilibrium test. Am J Hum Genet. 1998;62:450458. doi: 10.1086/301714. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM) Am J Hum Genet. 1993;52:506516. [PMC free article] [PubMed] [Google Scholar]
Terwilliger JD, Ott J. A haplotype-based haplotype relative risk approach to detecting allelic associations. Hum Hered. 1992;42:337346. doi: 10.1159/000154096. [DOI] [PubMed] [Google Scholar]
Yang Q, Rabinowitz D, Isasi C, Shea S. Adjusting for confounding due to population admixture when estimating the effect of candidate genes on quantitative traits. Human Heredity. 2000;50:227–233. doi: 10.1159/000022920. [DOI] [PubMed] [Google Scholar]
Whittemore A. Estimating genetic association parameter from family data. Biometrika. 2004;91:219–225. [Google Scholar]
Whittemore A, Halpern J. Genetic association tests for family data with missing parental genotypes: a comparison. Genetic Epidemiology. 2003;25:80–91. doi: 10.1002/gepi.10247. [DOI] [PubMed] [Google Scholar]

[R1] Abecasis GR, Cardon LR, Cookson WOC. A general test of association for quantitative traits in nuclear families. American Journal of Human Genetics. 2000;66:279–292. doi: 10.1086/302698. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Allen AS, Satten GA, Tsiatis AA. Locally-efficient robust estimation of haplotype-disease association in family-based studies. Biometrika. 2005;92:559–571. [Google Scholar]

[R3] Cox DR, Hinkley CV. Theoretical statistics. London: Chapman & Hall; 1979. [Google Scholar]

[R4] Curtis D, Sham PC. A note on the application of the transmission disequilibrium test when a parent is missing. American Journal of Human Genetics. 1995;56:811–812. [PMC free article] [PubMed] [Google Scholar]

[R5] Davignon J, Gregg RE, Sing CF. Apolipoprotein E polymorphism and atherosclerosis. Arteriosclerosis. 1988;8:121. doi: 10.1161/01.atv.8.1.1. [DOI] [PubMed] [Google Scholar]

[R6] Elston RC. Linkage and association. Genet Epidemiol. 1998:565–576. doi: 10.1002/(SICI)1098-2272(1998)15:6<565::AID-GEPI2>3.0.CO;2-J. [DOI] [PubMed] [Google Scholar]

[R7] Falk CT, Rubinstein P. Haplotype relative risks: An easy reliable way to construct a proper control sample for risk calculations. Ann Hum Genet. 1987;51:227233. doi: 10.1111/j.1469-1809.1987.tb00875.x. [DOI] [PubMed] [Google Scholar]

[R8] Fulker DW, Cherny SS, Sham PC, Hewitt JK. Combined linkage and association sib-air analysis for quantitative traits. American Journal ofHuman Genetics. 1999;64:259–267. doi: 10.1086/302193. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Hardin JW, Hilbe JM. Generalized Estimating Equations. Chapman and Hall; London, UK: 2003. [Google Scholar]

[R10] Hixson JE. Apolipoprotein E Polymorphisms Affect Atherosclerosis in Young Males: Pathobiological Determinants of Atherosclerosis in Youth (PDAY) Research Group. Arterioscler Thromb. 1991;11:237–244. doi: 10.1161/01.atv.11.5.1237. [DOI] [PubMed] [Google Scholar]

[R11] Horvath S, Xu X, Laird N. The family based association test method: strategies for studying general genotype-phenotype associations. Euro J Hum Gen. 2001;9:301–306. doi: 10.1038/sj.ejhg.5200625. [DOI] [PubMed] [Google Scholar]

[R12] Howard BV, Gidding SS, Liu K. Association of apolipoprotein E phenotype with plasma lipoproteins in African-American and white young adults. Am J Epidemiol. 1998;148:859868. doi: 10.1093/oxfordjournals.aje.a009711. [DOI] [PubMed] [Google Scholar]

[R13] Isasi CR, Shea S, Deckelbaum RJ, Couch SC, Starc TJ, Otvos JD, Berglund L. Apolipoprotein ε2 allele is associated with an anti-atherogenic Lipoprotein profile in children: the Columbia University Biomarker Study. Pediatrics. 2000;106:568–575. doi: 10.1542/peds.106.3.568. [DOI] [PubMed] [Google Scholar]

[R14] Lazzeroni LC, Lange K. A conditional interference framework for extending the transmission/disequilibrium test. Hum Hered. 1998;48:6781. doi: 10.1159/000022784. [DOI] [PubMed] [Google Scholar]

[R15] Morris RW, Kaplan NL. On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles. Genetic Epidemiology. 2002;23:221–233. doi: 10.1002/gepi.10200. [DOI] [PubMed] [Google Scholar]

[R16] Rabinowitz D, Larid N. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Human Hered. 2000;50:211–223. doi: 10.1159/000022918. [DOI] [PubMed] [Google Scholar]

[R17] Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–1517. doi: 10.1126/science.273.5281.1516. [DOI] [PubMed] [Google Scholar]

[R18] Rabinowtiz D. Adjusting for population heterogeneity and misspecified haplotype frequencies when testing nonparametric null hypotheses in statistical genetics. Journal of the American Statistical Association. 2002;92:742–758. [Google Scholar]

[R19] Shea S, Isasi CR, Couch S, Starc TJ, Tracy RP, Deckelbaum R, Talmud P, Berglund L, Humphries SE. Relations of plasma fibrinogen level in children to measures of obesity, the (G-455->A) mutation in the beta-fibrinogen promoter gene, and family history of ischemic heart disease: the Columbia University BioMarkers Study. Am J Epidemiol. 1999;150(7):737–46. doi: 10.1093/oxfordjournals.aje.a010076. [DOI] [PubMed] [Google Scholar]

[R20] Spielman RS, Ewens WJ. A sib-ship test for linkage in the presence of association: The sib transmission/disequilibrium test. Am J Hum Genet. 1998;62:450458. doi: 10.1086/301714. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM) Am J Hum Genet. 1993;52:506516. [PMC free article] [PubMed] [Google Scholar]

[R22] Terwilliger JD, Ott J. A haplotype-based haplotype relative risk approach to detecting allelic associations. Hum Hered. 1992;42:337346. doi: 10.1159/000154096. [DOI] [PubMed] [Google Scholar]

[R23] Yang Q, Rabinowitz D, Isasi C, Shea S. Adjusting for confounding due to population admixture when estimating the effect of candidate genes on quantitative traits. Human Heredity. 2000;50:227–233. doi: 10.1159/000022920. [DOI] [PubMed] [Google Scholar]

[R24] Whittemore A. Estimating genetic association parameter from family data. Biometrika. 2004;91:219–225. [Google Scholar]

[R25] Whittemore A, Halpern J. Genetic association tests for family data with missing parental genotypes: a comparison. Genetic Epidemiology. 2003;25:80–91. doi: 10.1002/gepi.10247. [DOI] [PubMed] [Google Scholar]

PERMALINK

Unbiased and Locally Efficient Estimation of Genetic Effect on Quantitative Trait in the Presence of Population Admixture

Yuanjia Wang

Qiong Yang

Daniel Rabinowitz

Abstract

1 Introduction

2 Methods

2.1 Model without family-specific effects

2.2 Including family-specific effects

3 Simulation Studies

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 10.

4 Data Analysis

Table 11.

Table 12.

Table 13.

5 Discussion

Table 9.

Acknowledgments

Appendix

Table A1.

Table A2.

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Unbiased and Locally Efficient Estimation of Genetic Effect on Quantitative Trait in the Presence of Population Admixture

Yuanjia Wang

Qiong Yang

Daniel Rabinowitz

Abstract

1 Introduction

2 Methods

2.1 Model without family-specific effects

2.2 Including family-specific effects

3 Simulation Studies

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 10.

4 Data Analysis

Table 11.

Table 12.

Table 13.

5 Discussion

Table 9.

Acknowledgments

Appendix

Table A1.

Table A2.

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases