Abstract
We consider the problem of case-control association testing in samples that contain related individuals, where we assume the pedigree structure is known. Typically, for each marker tested, some individuals will have missing genotype data. The MQLS method has been proposed for association testing in this situation. We show that the MQLS method is equivalent to an approach in which missing genotypes are imputed using the best linear unbiased predictor (BLUP) based on relatives' genotype data. Viewed this way, the MQLS exactly corrects for the imputation error and for the extra correlation due to imputation. We also investigate the amount of additional power for detecting association that is provided by this BLUP imputation approach.
Key words: GWAS, pedigrees, quasi-likelihood, score test
1. Introduction
Standard methods for case-control association testing typically assume that sampled individuals are unrelated. However, many ongoing genetic studies involve families, so genetic association analysis of such studies requires some way of dealing with related individuals. The simplest way is to select a subsample of unrelated individuals, but that clearly results in a loss of information. Family-based tests (Rabinowitz and Laird, 2000) can be used in some contexts. Such approaches have the benefit that they protect against population structure, but they can also be underpowered relative to case-control association tests in some circumstances (Risch and Teng, 1998; Thornton and McPeek, 2010). A third approach is to perform a standard χ2 test with a post-hoc correction factor applied. Slager and Schaid (2001) and Bourgain et al. (2003) have derived explicit correction factors for χ2 tests of association performed in samples of related individuals. Another way to obtain such a correction factor in the context of a genome screen would be to use the genomic control method of Devlin and Roeder (1999), which works well if the rate of missing genotypes is similar at all markers. A fourth approach is to try to make more efficient use of the available information to obtain a more powerful test than the one that results from correcting the standard χ2 test. This approach has been taken by Bourgain et al. (2003) and by Thornton and McPeek (2007), who showed that there can be a nonnegligible power gain for their MQLS method over the corrected χ2.
One purpose of this article is to better understand the nature of this power gain. One way in which the MQLS improves on the corrected χ2 is that it uses a more efficient estimator of allele frequency, which is a nuisance parameter in the analysis. Both the MQLS and corrected χ2 could also make use of unphenotyped controls who are not related to phenotyped individuals, but who are genotyped, to improve the nuisance parameter estimation. Additionally, the MQLS can make use of genotyped individuals who are unphenotyped but who have phenotyped relatives in the sample as well as phenotyped individuals who have missing genotypes but who have genotyped relatives in the sample. It is this last aspect of power gain, the gain from using information on related individuals with missing genotype or phenotype, that we study in this article. We derive a connection between the MQLS and imputation of missing genotypes based on genotypes of relatives, and we consider the question of how much power can be gained by squeezing this information from the data.
2. Methods
2.1. Best linear unbiased prediction of genotypes
First, we ignore phenotype and focus on the problem of predicting missing genotypes based on the observed genotypes of relatives. Suppose our study consists of n + m individuals, where we assume, without loss of generality, that the first n of the n + m individuals have non-missing genotype data at the marker under consideration, while the last m have missing genotype data at the marker. The n + m individuals can be arbitrarily related, with the pedigree(s) that specify the relationships assumed to be known. The method we describe is feasible even for large complex pedigrees with inbreeding loops. Unrelated individuals can also be included in the sample. If their genotypes are known, they will contribute to the prediction of missing genotypes in others by contributing to the allele frequency estimate. However, if individuals with missing genotypes are unrelated to others in the sample, then their genotypes will not be predicted by our method.
For the moment, we assume that the marker under consideration is an autosomal binary marker (e.g., a SNP) with alleles labeled “0” and “1.” We extend to multiple alleles in the next subsection. Let
be the vector of true genotypes at the marker under consideration, where Gi = 0,.5, or 1, according to whether individual i has, respectively, 0, 1, or 2 copies of allele 1 at the marker. We assume that the first n entries of G are observed, while the last m entries are unobserved and are to be predicted. We use the notation
to denote the partition of the vector G into the n-vector GN of observed genotypes and the m-vector GM of unobserved genotypes.
Let p represent the frequency of allele 1 in the population from which the pedigree founders are assumed to be drawn, where 0 < p < 1. Then, we have E(G) = p1n+m, where 1n+m is a column vector of length n + m with every entry equal to 1. If we further assume Hardy-Weinberg equilibrium (HWE) in the population from which the pedigree founders are drawn, then Var(G) = σ2Φ, where
and Φ is the kinship matrix given by
![]() |
(1) |
where φi,j is the kinship coefficient between individuals i and j, and hi is the inbreeding coefficient of individual i. To make the approach more robust to deviations from HWE, we can remove the assumption that
and simply assume Var(G) = σ2Φ, where σ2 is an unknown parameter that we estimate. Corresponding to the partition of G into GN and GM, we have the following partition of the Φ matrix:
![]() |
(2) |
where ΦN is the n × n kinship matrix for the individuals with observed genotypes at the marker,
is the n × m matrix of kinship coefficients for pairs of individuals in which one individual has oberved genotype and the other has unobserved genotype, and ΦM is the m × m kinship matrix for the individuals with unobserved genotypes at the marker. Provided that the set of individuals with observed genotypes does not contain both members of any monozygotic (MZ) twin pair, the matrix ΦN is invertible (Thornton and McPeek 2007).
Under these conditions, the best linear unbiased estimator (BLUE) of p is given by
, and the variance of this estimator is Var
. McPeek et al. (2004) show that this is the BLUE based on the allele indicators for the set of genotyped individuals in the situation when parental origin of allele is not known.
Now we propose to predict GM by finding its best linear unbiased predictor (BLUP) based on GN. That is, among all fixed m × n real matrices R, we find the one that minimizes
![]() |
(3) |
for every fixed m-vector c, subject to
![]() |
(4) |
Equivalently, we find the R satisfying condition (4) such that that Var
− Var(GM − RGN) is a positive semidefinite matrix for all
satisfying condition (4). In our case, condition (4) reduces to R1 = 1m, where we economize on subscripts by letting 1 always denote a vector of length n with every entry equal to 1, while we let 1m denote a vector of length m with every element equal to 1. In the Appendix, we show that the resulting R is given by
![]() |
(5) |
so that the BLUP of GM is given by
![]() |
(6) |
The variance of the BLUP is
![]() |
(7) |
If p were known, only the first term of equation (7) would be present; the second term of the variance results from estimation of p.
The above development holds for the case when the set of individuals with observed genotypes does not contain both members of any MZ twin pair. If the set of genotyped individuals contains both members of one or more MZ twin pairs, then ΦN is not invertible, but the above results hold with
replaced by the Moore-Penrose generalized inverse, which we write as
. Provided that the two members of any MZ twin pair have identical genotypes, use of
in place of
in the above formulas is mathematically equivalent to setting the genotype of one of the two members of each MZ twin pair to be missing, in order to obtain invertible ΦN.
Advantages of the BLUP,
, in equation (6), as a predictor of GM are that (1) it is extremely fast to calculate, even in large complex pedigrees with inbreeding loops, making it feasible to use in studies with large numbers of people and markers; and (2) its variance-covariance matrix, given in equation (7), is also very easy to calculate, making it relatively easy to incorporate the prediction uncertainty and correlation between predicted values into a genetic analysis. Note that for different markers, the sets N and M of individuals with, respectively, non-missing and missing genotype, will change, and so the vectors
and R will differ from marker to marker.
2.2. BLUP for multi-allelic markers
Now suppose the marker under consideration is observed to have a distinct alleles. Let
be the vector of allele frequencies for alleles 1 through a − 1. (Note that
, so pa is redundant and can be dropped from p.) For this subsection only, we redefine
, where
, with
equal to 0, 1/2 or 1, according to whether individual i has 0, 1, or 2 copies of the jth allele, 1 ≤ j ≤ a − 1. Thus, the vector G now has length (n + m)(a − 1). As before, the first n entries of each G(j) are observed, while the last m entries are to be predicted. We write
, to denote the partition of the vector G(j) into the n-vector
of observed genotypes and the m-vector
of unobserved genotypes.
In the multi-allelic case, we have E(G) = (Ia-1 ⊗ 1n+m)p, with ⊗ denoting Kronecker product and Ia-1 denoting the identity matrix of dimension a − 1. Under HWE, we also have Var(G) = F ⊗ Φ, where F(a-1)×(a-1) has (i,j)th entry equal to
and
. As shown by McPeek et al. (2004), the BLUE of p is given by
, where
, which is the BLUE for the frequency of allele j that would be calculated if the marker were treated as biallelic with one allele being j and all the other alleles being collapsed into a single “not-j” allele. Furthermore, Var
.
The BLUP of GM based on GN is found to be
, where
is the BLUP of
that would be calculated if the marker were treated as biallelic with one allele being j and all the other alleles being collapsed into a single “not-j” allele. We have
![]() |
(8) |
Equation (8) is similar to equation (7) with σ2 replaced by F ⊗ .
An important feature of the multiallelic BLUE and BLUP calculations is that they can be performed by, for example, calculating
and R only once for a given multiallelic marker and then taking the inner products of these with a- 1 different vectors 
2.3. Overview of the MQLS method for case-control association testing with related individuals
Now we suppose that, in addition to having genotype data, we also have case-control phenotype data, where we allow some phenotype values to be unknown. Let D denote the phenotype data on the n + m individuals, with each individual categorized as “affected,” “unaffected,” or “unknown phenotype.” Here, the designation “unknown phenotype” could be used to refer to, for example, an unphenotyped individual taken from a generic control panel. Alternatively, it could refer to an individual whose phenotype has not yet become apparent (e.g., for an age-related trait). For 1 ≤ i ≤ n + m, we define the ith component of D to be Di = 1 if i is affected, Di = 0 if i is unaffected, and Di = k if i is of unknown phenotype, where k is an externally obtained estimate of the population prevalence of the trait. We write
to denote the partition of D into the vectors DN and DM, corresponding to individuals with non-missing and missing genotype, respectively, at the marker being tested. For simplicity, we describe the MQLS method for the case when the marker being tested is biallelic. The multiallelic case is given in Thornton and McPeek (2007).
We analyze the data retrospectively, i.e., we condition on D and treat GN as random in the analysis. The retrospective approach is appropriate, for example, with either random or phenotype-based ascertainment. Our null hypothesis is that there is no association and no linkage between the marker being tested and the trait. Under the null hypothesis, we assume that E0(GN|D) = p1 and Var0(GN|D) = σ2ΦN. The test statistic for the MQLS method is given by
![]() |
(9) |
with
![]() |
(10) |
where In is the n × n identity matrix. We typically use
, as proposed by Thornton and McPeek (2010). Under the null hypothesis of no linkage and no association between the marker and the trait, the MQLS statistic given in equation (9) is asymptotically
distributed.
2.4. Previous interpretations of the MQLS method
There are several possible ways of understanding the MQLS statistic of equation (9). The original development of the MQLS came from the fact that it is the quasi-likelihood score test of the null hypothesis H0 : γ = 0 in the retrospective model E(GN|D) = p1 + γ[ΦN(DN − k1) + ΦN,M(DM − k1m)], where γ represents the association parameter. For a genotyped individual i, this conditional expectation can be written
![]() |
(11) |
where we let 2φi,i = 1 + hi. In expression (11), the association parameter, γ, is multiplied by a weighted sum of centered phenotype values, where the weight of individual j's centered phenotype is twice the kinship coefficient of individual j with individual i. This results in an enrichment effect, i.e., individuals with multiple affected relatives are assumed to have a higher chance of carrying a causal allele than individuals without affected relatives. For outbred individuals, it can be shown (Thornton and McPeek, 2007) that this retrospective model holds, up to terms of order o(γ), assuming any prospective, two-allele, disease model, when the effect size γ is close to 0.
A later development (Wang and McPeek, 2009) shows that the MQLS is closely connected to a retrospective likelihood score test based on the following prospective model:
. Here P0(D) is the null model for the joint distribution of phenotype values in the absence of association with the marker being tested, where arbitrary dependence among phenotypes of related individuals is allowed. It is actually not necessary to specify the form of P0(D). c(G,r) is simply a normalizing constant. Logistic regression can be obtained as a special case of this prospective model. If we take into account incomplete data in deriving a retrospective likelihood score test for r = 0 under this class of models, then it is asymptotically equivalent to the MQLS test, where the difference between them arises from the fact that the MQLS test estimates the nuisance parameter p by the best linear unbiased estimator (BLUE), while the retrospective likelihood score test uses the maximum likelihood estimator of p under the null hypothesis.
2.5. A new interpretation of the MQLS method
We describe a novel interpretation of the MQLS statistic, which is somewhat different in flavor from previous interpretations and, therefore, can be illuminating. We note that the expression VTGN in the numerator of the MQLS test statistic of equation (9) can be rewritten as
![]() |
(12) |
where L is the set of phenotyped individuals,
is the BLUE of allele frequency, and
is the BLUP of Gj given by equation (6). The first sum is taken over all individuals who are both phenotyped and genotyped, and it represents the inner product of their genotypic and phenotypic residuals, where phenotype is centered around the externally derived prevalence estimate, k, and genotype is centered around the BLUE of allele frequency. The second sum is over all phenotyped individuals who have missing genotype at the marker being tested, in which case we replace the missing genotype Gj by its BLUP under the null hypothesis,
. Thus, we can interpret the MQLS as involving a form of imputation of missing genotypes by their BLUPs based on genotyped relatives. The main advantage of this form of imputation is that, under the null hypothesis, the uncertainty in imputation and dependence in imputation across individuals is exactly taken into account in the variance that appears in the denominator of equation (9).
2.6. How much power does BLUP imputation of missing genotypes add to the analysis?
To assess the effect, on the analysis, of BLUP imputation of missing genotypes, we perform analytical power calculations using a noncentral
approximation to the alternative distribution of the MQLS statistic of equation (9). We illustrate this computation in two examples. In each case, we obtain a noncentrality parameter λ, and then calculate power at level 10−3 as 1 − Rλ,1(K), where Rλ,1 is the cumulative distribution function of a noncentral
with noncentrality parameter λ, and K ≈ 10.82757 is the upper 10−3 quantile of a
distribution, i.e., 1 − R01(K) = 10−3.
Example 1. Sib pairs
In this example, we assume that there are f sampled families, with each family consisting of a sib pair whose phenotypes are known. We assume that, in the resulting sample of size 2f, the proportion of affected individuals is μ, and the correlation of affection status between sibs in the study is ρ. (Note that both μ and ρ reflect ascertainment. For example, if discordant sib pairs were preferentially ascertained, then ρ could be negative in the sample, even if there were positive sib-sib correlation for the trait in the general population.) For all 3 cases below, we use the mean model given in equation (11). We define a value s, which we call the scaled genetic effect, by s = σ−2fμ(1 − μ)γ2, and we assume s > 0. All the noncentrality parameters we calculate are proportional to s.
Case 1. Everyone genotyped
When all 2f individuals are both phenotyped and genotyped at the marker being tested, then we have n = 2f, m = 0, and
![]() |
(13) |
Plugging into equation (10), we obtain vcomplete = DN − μ12f. Then the noncentrality parameter is
![]() |
where s is defined in the previous paragraph.
Case 2. One sib in each pair genotyped, BLUP used
In this case, both sibs in each pair are phenotyped, and the sib who has missing genotype in each pair is chosen at random. Then n = m = f, ΦN = If,
, and, plugging into equation (10), we obtain
. Then the noncentrality parameter that results is
![]() |
Case 3. One sib in each pair genotyped, missing sib discarded
In this case, although both sibs' phenotypes are observed, in each family, the sib with missing genotype is discarded from the analysis. This is in contrast to Case 2, in which that individual is still included. The calculation of E(GN|D) in case 3 is exactly the same as for case 2. However, for calculation of vdropped, we use n = f, m = 0, and ΦN = If. Then, plugging into equation (10), we obtain vdropped = DN − μ1f. The resulting noncentrality parameter is
.
By comparison of cases 1, 2, and 3 for sib pairs, we see that λcomplete > λpartial ≥ λdropped, with equality between λpartial and λdropped only when ρ = 1 or −1. The greatest difference between λpartial and λdropped occurs when ρ = 0. This makes sense because ρ = 1 or −1 corresponds to the case when the phenotype information on the ungenotyped individuals is completely redundant and provides no new information, while ρ = 0 corresponds to the phenotype information on the ungenotyped individuals being maximally informative (or at least, not well-predicted based on the phenotypes of their sibs). Power plots examining these three cases are given in Results.
Example 2. Sib quartets
The reason for considering the sib quartets example is that it is similar to the sib pairs example in many respects, except that, with two typed sibs per family, there is more information available on the missing genotypes, so the BLUP imputation might be expected to be more informative. We assume that there are f sampled families with each family consisting of a sib quartet whose phenotypes are known. In the resulting sample of 4f individuals, we assume that the proportion of affected individuals is μ, and the correlation of affection status between sibs in the study is ρ. For the sake of comparison, we use the same scaled genetic effect, s = σ−2fμ(1 − μ)γ2, that is used in the sib pair example.
Case 1. Everyone genotyped
When all 4f individuals are both phenotyped and genotyped at the marker being tested, then we have n = 4f, m = 0, and
![]() |
(14) |
We obtain vcomplete = DN − μ14f, and λcomplete = s(4 + 6ρ). Note that with four sibs, if every pair has correlation ρ, then we have the constraint −1/3 ≤ ρ ≤ 1, so λcomplete ≥ 2s > 0.
Case 2. Two sibs in each quartet genotyped, BLUP used
In this case, all four sibs in each quartet are phenotyped, and the 2 sibs who have missing genotype in each quartet are chosen at random. Then n = m = 2f, ΦN has the same form as in equation (13), and
![]() |
(15) |
We obtain
, where
![]() |
(16) |
is a permutation of the vector DM such that the two ungenotyped sibs in each family have their phenotypes interchanged. Then
.
Case 3. Two sibs in each quartet genotyped, missing sibs discarded
In this case, although all four sibs' phenotypes are observed, in each family, the two sibs with missing genotypes are discarded from the analysis. We calculate E(GN|D) in the same way as in case 2. However, for calculation of vdropped, we use n = 2f, m = 0, and the same ΦN as in equation (13). Then vdropped = DN − μ12f, and
.
By comparison of Cases 1, 2, and 3 for sib quartets, we see λcomplete > λpartial ≥ λdropped, with equality between λpartial and λdropped only when ρ = 1 (recall the constraint −1/3 ≤ ρ ≤ 1 for sib quartets). Power plots for all cases are given in Results.
3. Results
Figure 1 shows power results, at significance level 10−3, for f sampled families, where each family is either a sib pair or a sib quartet. In each case, the solid line represents the situation when all members of the sibship are available to be analyzed (Case 1 above). The dotted line represents the situation when all individuals are phenotyped, but only half the members of the sibship are genotyped, and the individuals with missing genotype are simply dropped from the analysis (Case 3 above). The dashed line represents the situation when phenotype data are available on all members of the sibship, but genotype data are available on only half the members, and the MQLS is used to analyze the data, which is equivalent to BLUP imputation of the missing genotypes for the other half of the sibship (Case 2 above). We expect the dashed line (BLUP imputation) to be intermediate between the dotted line (individuals with missing genotype discarded) and the solid line (complete information on the individuals), and it is of interest to get an idea of how much of the full information can be recovered by the BLUP, in the context of association testing.
FIG. 1.
How much power for association is recovered by using BLUPs of missing genotypes? Approximate power at level 10−3, as calculated using a non-central chi-square approximation, vs. scaled genetic effect, given by σ−2fμ(1 − μ)γ2, for sib pairs or sib quartets and for different values of the correlation between sampled sib phenotypes, where this correlation would depend on ascertainment. In each plot, the solid line represents the situation in which the phenotypes and genotypes of all sibs are observed, representing the gold standard of perfect recovery of missing genotypes. The dashed line represents the situation in which the genotypes of half of the sibs (1 sib in a sib pair or 2 sibs in a sib quartet) are not observed, and they are incorporated into the MQLS statistic, which is equivalent to BLUP imputation. The dotted line represents the situation in which the genotypes of half the sibs are not observed, and the ungenotyped sibs are simply removed from the analysis.
The sib quartet study design represents a larger sample, so it is to be expected that power should be higher than for the sib pair study design. In addition, we can see that the power is higher overall when the study design is such that the correlation between sampled sibs' phenotypes is higher. One explanation for this is the enrichment effect: for complex traits, an affected individual with an affected sibling is more likely to carry a particular variant associated with the phenotype than is an affected individual with an unaffected sibling. By the same token, an unaffected individual with an unaffected sibling is more likely to carry a particular protective variant than is an unaffected individual with an affected sibling. Therefore, we might expect to have higher power to detect a genetic effect when we sample individuals who have relatives with similar trait values. By comparing the results from complete data on the the sibship (solid lines) to the results that ignore half of the sibs (dotted lines) for different values of the correlation, we see that the higher the phenotype correlation between the sibs, the less important it is to have the missing sib(s) in the analysis (solid and dotted lines get closer). This also seems reasonable, because if the missing sibs' phenotype(s) are well-predicted by the observed sibs' phenotype(s), then there should be less new information by including the additional sib(s).
Finally, we can see that in all 6 cases, the use of the BLUP imputation for missing genotypes (i.e., MQLS test) can provide a moderate power increase over ignoring individuals with missing genotypes. In particular, the MQLS method seems more effective with the sib quartet design, which is to be expected, because in that case there is more information available, on related individuals, to predict the missing genotypes.
4. Discussion
We describe an interesting connection between the MQLS method, for case-control association testing in samples of related individuals, and the imputation of missing genotypes by the best linear unbiased predictor based on relatives' genotypes. In examples, we show that the use of BLUP to predict missing genotypes can add a reasonable amount of power to detect association. The amount of power added is higher when there are more typed relatives available to improve the prediction of missing genotypes.
The BLUP imputation described here is single-point. In contrast, most current genotype imputation methods (Scheet and Stephens, 2006; Browning, and Browning 2009) use information across many markers. However, with related individuals, imputed genotypes are dependent among relatives, where the dependence among imputed genotypes differs from the ordinary dependence among genotypes and is affected by the type and amount of information available for each individual. For association mapping, this complex dependence among imputed genotypes would need to be taken into account in the analysis in order to construct a valid test for association. A key feature of the single-point BLUP imputation we describe here is that the dependence among imputed genotypes is exactly taken into account in the construction of the MQLS test, in a way that is fast and computationally feasible even for large, inbred pedigrees.
The BLUP that we use is constructed assuming that there is no population structure beyond that captured by Φ. Because the genotype prediction for an individual is based on genotypes of close relatives, one would expect it to be robust to mild population structure. The main difficulty for the BLUP would seem to be the possibility that the BLUE of allele frequency,
which is the centering value for the BLUP, could be inappropriate in cases of highly differentiated markers, when the allele frequencies are very different in different subpopulations represented in the sample. If information on population structure were available (e.g. in the form of structure-capturing vectors), then this information could be used to replace the vectors
and
of equation (6) by vectors in which the entry for the ith individual is an ancestry-specific estimated allele frequency. Alternatively, with mild population structure, in the context of case-control association testing, the ROADTRIPS method (Thornton and McPeek, 2010) could be used. ROADTRIPS is a more robust form of MQLS in which an estimated structure matrix
is used to correct the variance of the test statistic for misspecified relationships in Φ as well as for mild population structure.
5. Appendix
5.1. Proof that the BLUP is given by equations (5) and (6) with variance in equation (7)
As mentioned in subsection 2.1, we need to find
minimizing
for every m-vector c, subject to condition (4), which is
. Note that condition (4) implies
, so we have
. Define R by equation (5). We can trivially write
.
Claim
Cov
.
Proof
We have Cov
. Applying equation (5), we get
. Condition (4) and equation (5) imply
, so we have
, which proves the claim.
Thus,
. This is minimized for every m-vector c by
. When ΦN is invertible, R is the unique minimizer, and the unique BLUP is given by RGN. (When the set of individuals with observed genotypes contains both members of one or more MZ twin pairs, ΦN is not invertible, and a minimizer R can be obtained by replacing
with
in equation (5). In that case, R is not the unique minimizer, but any other minimizer R* satisfies R*GN = RGN, provided that the two members of any MZ twin pair have identical genotypes, so RGN is still the unique BLUP.) Expression (6) follows immediately from expression (5), using
, where both
and
are scalars. Expression (7) follows from the fact that Var
.
5.2. Proof of equation (12)
We have
, which is the first line of equation (12). The second line of equation (12) follows by noting that for
and similarly for
.
Acknowledgments
This study was supported in part by the National Institutes of Health (grant R01 HG001645).
Disclosure Statement
No competing financial interests exist.
References
- Bourgain C. Hoffjian S. Nicolae R., et al. Novel case-control test in a founder population identifies P-selectin as an atopy susceptibility locus. Am. J. Hum. Genet. 2003;73:612–626. doi: 10.1086/378208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Browning S.R. Browning B.L. Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering. Am. J. Hum. Genet. 2007;81:1084–1097. doi: 10.1086/521987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Devlin B. Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341x.1999.00997.x. [DOI] [PubMed] [Google Scholar]
- McPeek M.S. Wu X. Ober C. Best linear unbiased allele-frequency estimation in complex pedigrees. Biometrics. 2004;60:359–367. doi: 10.1111/j.0006-341X.2004.00180.x. [DOI] [PubMed] [Google Scholar]
- Rabinowitz D. Laird N. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum. Hered. 2000;50:211–223. doi: 10.1159/000022918. [DOI] [PubMed] [Google Scholar]
- Risch N. Teng J. The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. Genome Res. 1998;8:1273–1288. doi: 10.1101/gr.8.12.1273. [DOI] [PubMed] [Google Scholar]
- Scheet P. Stephens M. A fast and flexible statistical method for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 2006;78:629–644. doi: 10.1086/502802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Slager S. Schaid D.J. Evaluation of candidate genes in case-control studies: a statistical method to account for related subjects. Am. J. Hum. Genet. 2001;68:1457–1462. doi: 10.1086/320608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thornton T. McPeek M.S. Case-control association testing with related individuals: a more powerful quasi-likelihood score test. Am. J. Hum. Genet. 2007;81:321–337. doi: 10.1086/519497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thornton T. McPeek M.S. ROADTRIPS: Case-control association testing with partially or completely unknown population and pedigree structure. Am. J. Hum. Genet. 2010;86:172–184. doi: 10.1016/j.ajhg.2010.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Z. McPeek M.S. An incomplete-data quasi-likelihood approach to haplotype-based genetic association studies on related individuals. JASA. 2009;104:1251–1260. doi: 10.1198/jasa.2009.tm08507. [DOI] [PMC free article] [PubMed] [Google Scholar]



















