Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Sep 1.
Published in final edited form as: Genet Epidemiol. 2016 Jun 17;40(6):502–511. doi: 10.1002/gepi.21985

Family-based Rare Variant Association Analysis: a Fast and Efficient Method of Multivariate Phenotype Association Analysis

Longfei Wang 1, Sungyoung Lee 1, Jungsoo Gim 2, Dandi Qiao 3,4, Michael Cho 3,5, Robert C Elston 6, Edwin K Silverman 3,5, Sungho Won 1,7,*
PMCID: PMC4981535  NIHMSID: NIHMS788490  PMID: 27312886

Abstract

Motivation

Family-based designs have been repeatedly shown to be powerful in detecting the significant rare variants associated with human diseases. Furthermore, human diseases are often defined by the outcomes of multiple phenotypes, and thus we expect multivariate family-based analyses may be very efficient in detecting associations with rare variants. However, few statistical methods implementing this strategy have been developed for family-based designs. In this report, we describe one such implementation: the multivariate family-based rare variant association tool (mFARVAT).

Results

mFARVAT is a quasi-likelihood-based score test for rare variant association analysis with multiple phenotypes, and tests both homogeneous and heterogeneous effects of each variant on multiple phenotypes. Simulation results show that the proposed method is generally robust and efficient for various disease models, and we identify some promising candidate genes associated with chronic obstructive pulmonary disease.

Keywords: Family-based design, rare variants, association analysis, multivariate phenotypes

1 INTRODUCTION

In spite of tens of thousands of genome-wide association studies (GWAS), the so-called missing heritability (Manolio, Collins et al. 2009) reveals that analyses of common variants detect only a limited number of disease susceptibility loci and a substantial amount of causal variants may remain undiscovered by GWAS. Sequencing technology was expected to supply this additional information by obtaining large stretches of DNA spanning the entire genome, and improvements in this technology have enabled genetic association analysis of rare/common causal variants. However, the ‘common disease rare variant’ hypothesis implies that multiple rare variants can affect disease status and thus the proportion of affected individuals sharing the same causal variants could be very small. Therefore, analyses of rare variants suffer from genetic heterogeneity among affected individuals. In this context, because affected relatives have more chance to share the same causal variants (Shi and Rao 2011), and hence the genetic heterogeneity among affected relatives is expected to be smaller, family-based analyses have been repeatedly addressed as an important strategy.

Genetic association analyses simultaneously test a large number of variants, and stringent significance levels imposed by the multiple testing problem highlight the importance of powerful strategies. In particular, multiple measurements can be obtained from different but related phenotypes, or from repeated measurements of a single phenotype at different time points. Association analyses with multiple phenotypes often lead to substantial improvements in statistical power (Schifano, Li et al. 2013) and such improvements are inversely related to correlations between phenotypes (Lee, Park et al. 2014). Several different methods have been proposed, including the scaled marginal model (Schifano, Li et al. 2013) and the extended Simes procedures for population-based samples (van der Sluis, Posthuma et al. 2013). The statistical power of these methods depends on the relationships between the causal variants and the multiple phenotypes, which are usually unknown (van der Sluis, Posthuma et al. 2013); this property applies to rare variant association analyses. For instance, if the effects of the rare variants on each of the multiple phenotypes are in the same direction, the burden test may be most efficient; but if the multiple genetic effects are heterogeneous, SKAT may be more reasonable (Lee, Wu et al. 2012).

However, phenotypic relatedness between family members complicates parameter estimation, particularly for dichotomous phenotypes. For this situation, very few approaches other than FBAT statistics (Laird, Horvath et al. 2000), which can be used to conduct multivariate genetic association analyses with large families, are available. FBAT statistics preserve robustness against population substructure and have been extended for joint analysis of multiple phenotypes and genotypes (Gray-McGuire, Bochud et al. 2009), and for rare variant association analysis (Yip, De et al. 2011). However, FBAT statistics do not fully use the information in the parental phenotypes, and loss of power can be substantial if the number of founders is relatively large.

Recently, the FAmily-based Rare Variant Association Test (FARVAT) based on quasi-likelihood was proposed (Choi, Lee et al. 2014). FARVAT is robust against population substructure, and includes burden, SKAT and SKAT-O statistics for both dichotomous and quantitative phenotypes. In this report, we extend FARVAT to implement the multivariate family-based rare variant association analysis tool (mFARVAT). mFARVAT includes both homogeneous and heterogeneous approaches, and, in this respect, is similar to skatMeta (Lee, Teslovich et al. 2013). The method can analyze both quantitative and dichotomous phenotypes, and is robust against population substructure if the correlation matrix between individuals can be estimated from large-scale genetic data. mFARVAT is implemented in C++, and is computationally fast even for extended families. Furthermore, mFARVAT was applied to multiple phenotypes associated with chronic obstructive pulmonary disease (COPD), and some promising results illustrate its practical value.

2 METHODS

For genetic association analyses either prospective or retrospective approaches can be selected and the choice of strategy depends on the sampling scheme. However, it has been shown that even for prospectively selected samples, retrospective analyses can preserve virtually similar statistical power as prospective analyses. Additionally, retrospective strategies are robust against non-normality of phenotypes, and are computationally less intensive (Won and Lange 2013). Therefore, we consider retrospective analysis for both prospectively and retrospectively selected samples, and genetic association is detected by testing the independence of genotype distributions with phenotypes.

2.1 Notation and disease model

Association between M genetic variants and Q phenotypes is examined, and we denote the coded genotype of individual j in family i at variant m and phenotype q by xijm and yijq, respectively. We assume there are n families and ni individuals in family i. Thus, the sample size, N, is i=1nni. We let

Xm=[x11mxnnnm],X=(X1,,XM), and
Yq=[y11qynnnq],Y=(Y1,,YQ).

We also define

Xij=[xij1xijM], and Yij=[yij1yijQ].

The genetic variance-covariance matrix between individuals can be parameterized with the kinship coefficient matrix (KCM), Φ. If we let πij,ij' be the kinship coefficient between individual j and individual j' in family i, and let dij be the inbreeding coefficient for individual j in family i, Φi is

[1+di12πi1,i22πi1,i32πi1,i21+di22πi2,i32πi1,i32πi2,i31+di3],

and we define

Φ=[Φ1000Φ2000Φ3].

In the presence of population substructure, Φ should be replaced with the genetic relationship matrix (GRM) to provide statistically valid results (Thornton, Tang et al. 2012). The variance-covariance matrix between the M additively coded markers is denoted by Ψ, and we assume that

cov(Xij,Xij)2πij,ijvar(Xij)=2πij,ijΨ.

Then we can easily show that

var(vec(X))ΨΦ.

2.2 Choice of offset

It has been shown that the statistical efficiency of test statistics in retrospective analysis can be improved by adjusting phenotypes for relevant covariates (Lange, DeMeo et al. 2002). For our score statistic, we introduced a new parameter μijq for phenotype q of individual j in family i, which will be called the offset in the remainder of this report (Won and Lange 2013). We set

μij=[μij1μijQ],μ=(μ11t,,μnnnt)t,Tij=Yijμij,T=Yμ.

Statistical efficiency depends on μ, and thus its elements need to be carefully selected. The offset μ can be either calculated by the best linear unbiased predictor (BLUP) with covariates, as done for SKAT, or the disease prevalence can be used (Won and Lange 2013). The most efficient μ will depend on the sampling scheme. If families are randomly selected, BLUP was shown to be most efficient for both dichotomous and quantitative phenotypes (Won and Lange 2013), while prevalence was recommended to study dichotomous phenotypes if families with a large number of affected family members are selected (Thornton and McPeek 2007, Won and Lange 2013). Therefore, we chose BLUP and prevalence as offsets for quantitative phenotypes and dichotomous phenotypes, respectively.

2.3 Score for quasi-likelihood

We let eij be an N dimensional vector in which the (j+i=1i1ni)th element is 1 and the others are 0, and 1w be a column vector with w elements all equal to 1. We denote the effect of rare variant m on phenotype q as βmq which is the regression coefficients of the phenotype on the causal variants. We consider the score statistic and thus βmq is not needed to be estimated. However, the false positive rates can be inflated and the statistic for each βmq has large false negative rates. Therefore, collapsed genotype scores were utilized to prevent these problems. Under the null hypothesis, which is β11 = ⋯ = βMQ = 0, the best linear unbiased estimator (BLUE) for E(Xm) (McPeek, Wu et al. 2004) is

1N(1NtΦ11N)11NtΦ1Xm,

and if we let A=Φ1Φ11N(1NtΦ11N)11NtΦ1, we can define Sijm for the individual j in family i by

Sijm=(Tijeijt)ΦAXm.

Based on MFQLS (Won, Kim et al. 2015), the score vector for the M variants can be defined by

S=(S1,,SM)=TtΦAX,

and because var(vec(X)) ≈ ΨΦ, the variance-covariance matrix for S is approximately equal to

var(vec(S))Ψ(TtΦAΦT).

2.4 Homogeneous mFARVAT

The effects of each causal variant on a phenotype, estimated as the regression coefficients of the phenotype on the causal variants, can be in the same or different directions, and we propose two different statistics for these two scenarios. Our first statistic, homogeneous mFARVAT, assumes that effects of each causal variant on the multiple phenotypes are in the same direction, for example, when the phenotypes are highly correlated or longitudinal. For rare variant association analysis, burden tests regress phenotypes on the sum of genotype scores over rare variants. Therefore, association of the Q phenotypes with variant m can be built by testing whether βm1 + … + βmQ = 0, and we can provide a statistic based on 1QtS.

The importance of each variant is often different and statistical efficiency can be improved by weighting each variant based on its relative importance (Madsen and Browning 2009). Relative importance is usually expressed by a function of minor allele frequency (MAF). We assume that the weight for variant m is wm and W is an M×M diagonal matrix with diagonal elements wm; we choose wm = beta(pm, a1, a2) proposed by Wu et al (Wu, Lee et al. 2011), where pm is the MAF of variant m and a1 and a2 were set to be 1 and 25 respectively. beta(pm, a1, a2) is flexible because it can accommodate a broad range of scenarios by considering different a1 and a2, and Wu et al found that the choices of a1 and a2 were often efficient. Then the scores for the burden and SKAT tests are, respectively,

11QtTtΦAΦT1Q1QtSW1M1MtWSt1Q,

and

11QtTtΦAΦT1Q1QtSWWSt1Q.

If we let

RρHom=(1ρ)IM+ρ1M1Mt,

scores for burden and SKAT tests can be generalized as

MSρHom=11QtTtΦAΦT1Q1QtSWRρHomWSt1Q,

where the optimal choice of ρ depends on the distribution of rare variant effects on the multiple phenotypes.

We denote the eigenvalues of Ψ1/2WRρHomWΨ1/2 by (λ1ρ,,λMρ). If we let χ1,m2 be an independent chi-square distribution with a single degree of freedom, we have

MSρHom~m=1Mλmρχ1,m2.

If we denote the p-value for MSρHom by pMSρHom, and let pmFARVATSHom=pMS0Hom and pmFARVATBHom=pMS1Hom, the SKAT-O mFARVAT (mFARVATO) statistic is defined by

mFARVATOHom=min{pMS0Hom,pMS0.12Hom,,pMS0.52Hom,pMS1Hom}.

Its p-value will be denoted as pmFARVATOHom in the remainder of this report, and can be calculated from the numerical algorithm for SKAT-O (Lee, Wu et al. 2012), with a small modification (see Supplementary Text 1 for the detailed algorithm).

2.5 Heterogeneous mFARVAT

The effect of each variant on a phenotype can be heterogeneous in certain situations, and it may be reasonable to consider such effects separately. Therefore, we can provide statistics based on vec(S), and, under the null hypothesis β11 = ⋯ = βMQ = 0, we have

E{vec(S)}=0 and var{vec(S)}=ΨTtΦAΦT.

If we assume that Iw is a w × w identity matrix and

RρHet=(1ρ)IMQ+ρ1MQ1MQt,

we define the generalized score by

MSρHet=vec(S)t(IQW)RρHet(IQW)vec(S).

Then the burden and SKAT tests can be expressed as

MS1Het=vec(S)t(IQW)1MQ1MQt(IQW)vec(S),
MS0Het=vec(S)t(IQW)(IQW)vec(S).

If we let (λ1ρ,,λMQρ) be the eigenvalues of

(Ψ1/2(TtΦAΦT)1/2)(IQW)Rρ×(IQW)(Ψ1/2(TtΦAΦT)1/2),

then we have

MSρHet~l=1MQλl'ρχ1,l2 under H0.

P-values for MSρHet will be denoted by pMSρHet, and we let pmFARVATSHet=pMS0Het and pmFARVATBHet=pMS1Het. We consider

mFARVATOHet=min{pMS0Het,pMS0.12Het,,pMS0.52Het,pMS1Het}.

We let the p-value formFARVATOHet be pmFARVATOHet and the detailed algorithm to calculate the asymptotic p-value is provided in Supplementary Text 2.

2.6 The simulation model

To evaluate mFARVAT, we simulated large families that extend three generations and consist of 10 members (see Supplementary Figure 1). 5,000 haplotypes with 50,000 base pairs were generated under a coalescent model using the software COSI (Schaffner, Foo et al. 2005). Each haplotype was generated by setting the mutation rate at 1.5 × 10−8. Haplotypes were randomly chosen with replacement to build founder genotypes. Nonfounder haplotypes were determined in Mendelian fashion from pairs of parents under the assumption of no recombination. For each simulated haplotype, we defined variants with sample MAFs less than 0.01 as being rare, and 60 rare variants were randomly selected.

Phenotypes were generated under the null and alternative hypotheses, and we considered both quantitative and dichotomous phenotypes. Quantitative phenotypes were defined by summing the phenotypic mean, polygenic effect, main genetic effect and random error, and we assumed there was no environmental effect shared between family members. Phenotypic means were denoted by α1,…, αQ−1 and αQ. We assumed that α1 = 0, α2 = 0.3 for Q = 2, and α1 = α2 = α3 = 0, α4 = α5 = 0.3 for Q = 5. The polygenic effects for the Q phenotypes for each founder were independently generated from MVN(0,ΣB), and for nonfounders the average of maternal and paternal polygenic effects were combined with values independently sampled from MVN(0, 0.5ΣB). Random errors for the Q phenotypes were assumed to be independent, so the random error for phenotype q was independently sampled from N(0, σE,q2). If Q = 2, we assumed that

ΣB=[12c2c2],σE,12=1,σE,22=2,

and if Q = 5, they were

ΣB=[1c2c2c2cc12c2c2c2c2c22c2c2c2c2c22c2c2c2c2c2],
σE,12=1,σE,22=2,σE,32=3,σE,42=4,σE,52=5.

For c we chose 0.5 and 0.8.

The genetic effect at variant m for phenotype q was the product of βmq and the number of disease susceptibility alleles. Under the null hypothesis, βmq was assumed to be 0. Under the alternative hypothesis, if we let ha2 be the proportion of variance explained by rare variants, βmq was sampled from U(0,vq), where

νq=(σB,q2+σE,q2)ha2(1ha2)m=1Mβmq22pm(1pm).

Here σB,q2 indicates the (q,q)th element of ΣB, and we assumed that ha2 = 0.02. βmq was generated for both heterogeneous and homogeneous scenarios. For homogeneous scenarios, we assumed that the effects of each rare variant on different phenotypes are similar. For example, the ratios between βm1, …, and βmQ were assumed to be 1:0.9 if Q = 2, and 1:0.9:0.8:0.7:0.6 if Q = 5. For heterogeneous scenario, the effects of each rare variant on phenotypes were independently generated from U(0,vq).

Simulation of dichotomous phenotypes was performed using the liability threshold model. Once the quantitative phenotypes with genetic effect, polygenic effect and random error were generated, they were transformed to being affected for quantitative phenotypes larger than the threshold, and otherwise were transformed to unaffected. The threshold was chosen to preserve the assumed disease prevalence. We assumed that prevalences of the multiple phenotypes were 0.1 or 0.2 if Q = 2, and 0.1, 0.2, 0.2, 0.3, or 0.3 if Q = 5. To allow for the ascertainment bias of dichotomous phenotypes in our simulation studies, we assumed that families with at least one affected individual were selected for analysis.

3 RESULTS

3.1 Evaluation of mFARVAT with simulated data

To evaluate statistical validity, type-1 error estimates for both dichotomous and quantitative phenotypes were calculated at various significance levels using 20,000 replicates of two hundred extended families, so that each replicate sample contained 2,000 individuals. Supplementary Table 1 shows empirical type-1 error estimates for homogeneous mFARVAT (mFARVATHom) and heterogeneous mFARVAT (mFARVATHet) at the 0.05, 0.01, 0.001, and 2.5×10−6 significance levels. The estimates are virtually equal to the nominal significance levels for both quantitative and dichotomous phenotypes. Quantile-quantile (QQ) plots in Supplementary Figures 2 and 3 also show consistent results, and we conclude that mFARVATHet and mFARVATHom are statistically valid.

Empirical power estimates were calculated at the 10−4 significance level with correlations 0.5 and 0.8 for quantitative phenotypes (for the underlying quantitative phenotypes in the case of dichotomous phenotypes). We considered two different scenarios, in which either all or half the rare variants were causal, and assumed that 50%, 80% and 100% of causal variants were deleterious, with the rest being protective. Empirical power estimates were calculated with 2,000 replicates for six different statistics: (1) mFARVATOHet; (2) mFARVATOHom; (3) mFARVATSHet; (4) mFARVATSHom; (5) mFARVATBHet; (6) mFARVATBHom. Results are provided in Tables 13 and Tables 46, which represent respectively scenarios where all or half the rare variants are causal. Notably, each method performed similarly in both scenarios, although the empirical power estimates improve if causal variants are more abundant.

Table 1. Empirical power estimates when all rare variants are causal and 100% of them are deleterious.

Empirical power of mFARVATSHet,mFARVATBHet,mFARVATOHet,mFARVATBHom,mFARVATSHom and mFARVATOHom was calculated for dichotomous and quantitative multiple phenotypes (Q = 2 and Q = 5) with homogeneous and heterogeneous effects and different correlations (c = 0.5 and c = 0.8) at the 10−4 significant level. Empirical power of FARVAT was calculated by adopting Bonferroni correction to the minimum p-value of univariate association tests on multiple phenotypes.

Q Type c Eff FARVAT mFARVATHet mFARVATHom
SKAT Burden SKAT-O SKAT Burden SKAT-O SKAT Burden SKAT-O
2 D 0.5 Het 0.208 0.712 0.738 0.331 0.908 0.912 0.337 0.896 0.900
Hom 0.196 0.766 0.778 0.353 0.928 0.927 0.439 0.915 0.925
0.8 Het 0.200 0.713 0.723 0.310 0.876 0.875 0.290 0.865 0.859
Hom 0.201 0.705 0.729 0.333 0.865 0.874 0.373 0.853 0.874
Q 0.5 Het 0.350 0.987 0.987 0.531 0.998 0.998 0.593 0.999 0.998
Hom 0.396 0.984 0.979 0.574 0.998 0.998 0.755 0.996 0.997
0.8 Het 0.251 0.980 0.979 0.490 0.995 0.999 0.486 0.995 0.995
Hom 0.365 0.977 0.977 0.509 0.996 0.995 0.607 0.996 0.995
5 D 0.5 Het 0.317 0.924 0.934 0.839 1.000 1.000 0.826 1.000 1.000
Hom 0.315 0.948 0.955 0.868 1.000 1.000 0.947 1.000 1.000
0.8 Het 0.267 0.887 0.900 0.706 0.991 0.995 0.635 0.990 0.992
Hom 0.265 0.893 0.914 0.756 0.995 0.995 0.814 0.995 0.995
Q 0.5 Het 0.540 0.998 0.998 0.952 1.000 1.000 0.973 1.000 1.000
Hom 0.602 1.000 1.000 0.968 1.000 1.000 0.999 1.000 1.000
0.8 Het 0.495 0.992 0.993 0.879 1.000 1.000 0.836 1.000 1.000
Hom 0.525 0.994 0.994 0.890 1.000 1.000 0.957 1.000 1.000

Table 3. Empirical power estimates when all rare variants are causal and 50% of them are deleterious.

Empirical power of mFARVATSHet,mFARVATBHet,mFARVATOHet,mFARVATBHom,mFARVATSHom and mFARVATOHom was calculated for dichotomous and quantitative multiple phenotypes (Q = 2 and Q = 5) with homogeneous and heterogeneous effects and different correlations (c = 0.5 and c = 0.8) at the 10−4 significant level. Empirical power of FARVAT was calculated by adopting Bonferroni correction to the minimum p-value of univariate association tests on multiple phenotypes.

nphe Type Cor Eff FARVAT mFARVATHet mFARVATHom
SKAT Burden SKAT-O SKAT Burden SKAT-O SKAT Burden SKAT-O
2 D 0.5 Het 0.038 0.000 0.028 0.069 0.000 0.048 0.020 0.000 0.016
Hom 0.050 0.000 0.031 0.120 0.003 0.081 0.181 0.002 0.133
0.8 Het 0.064 0.000 0.038 0.087 0.000 0.068 0.016 0.000 0.010
Hom 0.052 0.003 0.039 0.100 0.007 0.082 0.127 0.004 0.101
Q 0.5 Het 0.330 0.000 0.236 0.489 0.001 0.386 0.121 0.001 0.070
Hom 0.335 0.001 0.240 0.481 0.003 0.405 0.657 0.005 0.566
0.8 Het 0.341 0.001 0.246 0.431 0.000 0.327 0.103 0.000 0.064
Hom 0.312 0.002 0.230 0.410 0.008 0.329 0.533 0.007 0.433
5 D 0.5 Het 0.067 0.001 0.038 0.409 0.001 0.320 0.043 0.000 0.024
Hom 0.065 0.002 0.105 0.499 0.018 0.434 0.763 0.009 0.687
0.8 Het 0.073 0.001 0.036 0.382 0.000 0.282 0.007 0.000 0.006
Hom 0.060 0.000 0.044 0.381 0.009 0.305 0.557 0.003 0.445
Q 0.5 Het 0.529 0.001 0.365 0.944 0.000 0.906 0.043 0.000 0.024
Hom 0.472 0.001 0.333 0.883 0.018 0.836 0.983 0.012 0.972
0.8 Het 0.543 0.000 0.371 0.913 0.000 0.866 0.019 0.000 0.012
Hom 0.411 0.001 0.277 0.817 0.008 0.744 0.918 0.005 0.875

Table 4. Empirical power estimates when half rare variants are causal and 100% of them are deleterious.

Empirical power of mFARVATSHet,mFARVATBHet,mFARVATOHet,mFARVATBHom,mFARVATSHom and mFARVATOHom was calculated for dichotomous and quantitative multiple phenotypes (Q = 2 and Q = 5) with homogeneous and heterogeneous effects and different correlation (c = 0.5 and c = 0.8) at the 10−4 significant level. Empirical power of FARVAT was calculated by adopting Bonferroni correction to the minimum p-value of univariate association tests on multiple phenotypes.

nphe Type Cor Eff FARVAT mFARVATHet mFARVATHom
SKAT Burden SKAT-O SKAT Burden SKAT-O SKAT Burden SKAT-O
2 D 0.5 Het 0.207 0.273 0.427 0.340 0.523 0.663 0.219 0.488 0.569
Hom 0.259 0.298 0.484 0.403 0.578 0.712 0.502 0.536 0.729
0.8 Het 0.425 0.634 0.788 0.572 0.831 0.910 0.379 0.826 0.863
Hom 0.488 0.684 0.834 0.678 0.857 0.937 0.812 0.851 0.948
Q 0.5 Het 0.178 0.254 0.419 0.285 0.422 0.575 0.139 0.382 0.461
Hom 0.248 0.283 0.462 0.390 0.497 0.647 0.420 0.460 0.631
0.8 Het 0.400 0.602 0.754 0.521 0.759 0.877 0.251 0.752 0.794
Hom 0.434 0.646 0.767 0.562 0.781 0.883 0.645 0.768 0.901
5 D 0.5 Het 0.294 0.427 0.630 0.839 0.921 0.977 0.417 0.908 0.910
Hom 0.375 0.512 0.722 0.886 0.963 0.990 0.952 0.951 0.995
0.8 Het 0.609 0.807 0.920 0.974 0.997 0.999 0.645 0.997 0.997
Hom 0.665 0.845 0.944 0.977 0.999 1.000 0.999 0.998 1.000
Q 0.5 Het 0.266 0.383 0.582 0.729 0.817 0.917 0.276 0.812 0.817
Hom 0.328 0.464 0.651 0.773 0.867 0.947 0.854 0.835 0.960
0.8 Het 0.595 0.759 0.901 0.900 0.961 0.991 0.405 0.955 0.956
Hom 0.631 0.782 0.911 0.919 0.973 0.996 0.963 0.969 0.998

Table 6. Empirical power estimates when half rare variants are causal and 50% of them are deleterious.

Empirical power of mFARVATSHet,mFARVATBHet,mFARVATOHet,mFARVATBHom,mFARVATSHom and mFARVATOHom was calculated for dichotomous and quantitative multiple phenotypes (Q = 2 and Q = 5) with homogeneous and heterogeneous effects and different correlation (c = 0.5 and c = 0.8) at the 10−4 significant level. Empirical power of FARVAT was calculated by adopting Bonferroni correction to the minimum p-value of univariate association tests on multiple phenotypes.

nphe Type Cor Eff FARVAT mFARVATHet mFARVATHom
SKAT Burden SKAT-O SKAT Burden SKAT-O SKAT Burden SKAT-O
2 D 0.5 Het 0.041 0.000 0.024 0.094 0.000 0.064 0.024 0.000 0.011
Hom 0.054 0.001 0.036 0.148 0.004 0.113 0.196 0.001 0.139
0.8 Het 0.343 0.001 0.239 0.500 0.001 0.403 0.127 0.000 0.090
Hom 0.341 0.004 0.246 0.500 0.004 0.421 0.677 0.004 0.568
Q 0.5 Het 0.050 0.000 0.036 0.071 0.001 0.050 0.020 0.000 0.012
Hom 0.062 0.001 0.037 0.118 0.000 0.088 0.139 0.001 0.102
0.8 Het 0.352 0.001 0.253 0.426 0.000 0.327 0.087 0.000 0.062
Hom 0.333 0.001 0.232 0.430 0.002 0.356 0.554 0.002 0.448
5 D 0.5 Het 0.079 0.000 0.053 0.420 0.001 0.318 0.017 0.000 0.018
Hom 0.130 0.000 0.086 0.532 0.014 0.468 0.771 0.009 0.703
0.8 Het 0.535 0.003 0.367 0.940 0.001 0.891 0.046 0.000 0.031
Hom 0.508 0.000 0.364 0.900 0.014 0.848 0.977 0.007 0.963
Q 0.5 Het 0.075 0.010 0.042 0.363 0.000 0.264 0.002 0.000 0.002
Hom 0.115 0.003 0.075 0.449 0.012 0.389 0.594 0.009 0.505
0.8 Het 0.562 0.001 0.377 0.901 0.000 0.841 0.017 0.000 0.010
Hom 0.466 0.003 0.338 0.815 0.014 0.759 0.914 0.010 0.874

We first examined the efficiency of the methods. Tables 16 confirm that the most efficient method depends on the disease model, which tends to be unknown. For example, when all the rare causal variants have deleterious effects on all phenotypes, burden mFARVAT (mFARVATB) outperforms all other approaches, but if there are variants with deleterious and protective effects, SKAT mFARVAT (mFARVATS) is the most efficient. SKAT-O mFARVAT (mFARVATO) is not always the best, but its empirical power estimates are usually very close to those of the most efficient approach. Therefore, our results are consistent with previous findings that mFARVATO is robust and efficient for various disease models (Lee, Wu et al. 2012).

We also compared the performance of mFARVATHet and mFARVATHom using simulated data. Tables 16 show that if the effects of each rare variant on phenotypes are heterogeneous, mFARVATHet performs better than mFARAVATHom, and vice versa. In addition, when the effects of causal variants go in different directions, as in cases where some variants are deleterious while others are protective, the gap between the power of mFARVATHet and mFARAVATHom is larger than in a scenario where such effects are in the same direction. Interestingly, for each method the statistical power difference between 100% and 50% deleterious causal variants seems to be larger for family-based samples than that for population-based designs (Lee, Emond et al. 2012).

Results for dichotomous phenotypes tend to be similar to those for quantitative phenotypes, although statistical power for the former is usually smaller. This difference may be explained by the fact that dichotomous phenotypes were transformed from quantitative phenotypes. Moreover, overall the power is seen to be inversely related to correlations among phenotypes. There is some power loss when c is increased from 0.5 to 0.8. Notably, when more phenotypes are included in the analysis, mFARVAT performs more effectively.

Last, we compared the proposed method with univariate analyses using FARVAT (Choi, Lee et al. 2014). The minimum p-value adjusted by Bonferroni correction was selected to calculate the power of univariate analyses. We considered two scenarios: multiple phenotypes are associated with variants and only a single phenotype is associated with variants. Results in Tables 16 show that for the former scenario multivariate rare variant analyses perform better than univariate analyses. For the latter scenario, univariate rare variant analyses outperform multivariate analyses (see Supplementary Table 2).

3.2 Application to real data

We applied mFARVAT to whole-exome sequencing data from the Boston Early-onset COPD Study (Silverman, Chapman et al. 1998). Sequencing was performed at the University of Washington Center for Mendelian Genomics. Quality control was performed using PLINK (Purcell, Neale et al. 2007), vcfTools (Danecek, Auton et al. 2011), and PLINK/SEQ at Brigham and Women’s Hospital. Quality control included Mendelian error rates (< 1%), Hardy-Weinberg equilibrium (HWE, p > 10−8), and average sequencing depth (> 12). Relatedness of individuals was evaluated by comparing KCM and GRM. Heterozygous/homozygous genotype ratio, Mendelian errors, proportion of variants in dbSNP and proportion of non-synonymous variants were used to identify outliers. After additionally filtering out samples with missing phenotypes or covariates, 254 samples from 49 families were obtained.

We considered five COPD-related phenotypes: forced expiratory volume in one second pre-bronchodilator (FEVPRE); forced vital capacity post-bronchodilator (FVCPST); forced expiratory flow 25–75% pre-bronchodilator (DPRF2575); FEVPRE divided by FVCPRE (RATIO); and DPRF2575 divided by FVCPRE (F2575RAT). Sex, age, height, and pack-years of cigarette smoking were utilized to estimate BLUP offsets. It should be noted that genotypes were not used to estimate offsets. The correlation structure of the phenotypes is shown in Supplementary Table 3.

We assumed that variants with MAFs less than 5% were rare, and we considered only genes with at least two rare variants and a minor allele count (MAC) of at least four. As a result, 8126 genes and 88,373 rare variants were analyzed. Our statistic requires the correlation matrix between individuals to obtain Φ. If there exists population substructure, GRM should be utilized for Φ and otherwise KCM is adequate. We found no significant population substructure, and KCM was used for Φ. The Bonferroni-corrected 0.05 genome-wide significance level is 6.15E-6. QQ plots in Supplementary Figures 6 show the statistical validity of our analysis. Manhattan plots are shown in Supplementary Figure 7. The top 10 most significant results from mFARVATHet and mFARVATHom are shown in Table 7. We could not find any genome-wide significant results with association analysis of multiple phenotypes. The most significant result was found for KRTAP5-9 on chromosome 11, with mFARVATHet (p-value = 1.00×10−4), but the p-value for KRTAP5-9 from mFARVATHom is 2.72×10−4. The smaller p-value of mFARVATHet may indicate that effect of each rare variant on the multiple phenotypes is heterogeneous.

Table 7. mFARVAT analysis of COPD-releted phenotypes.

Genes are the top 10 most significant results from mFARVATOHet and mFARVATOHom.

method chr gene MAC N. of variants p-value
Het 11 KRTAP5-9 21 3 1.00×10−04
13 DIAPH3 40 7 1.73×10−04
4 ENAM 82 9 3.16×10−04
2 SLC8A1 5 3 3.38×10−04
3 MFI2 32 5 4.30×10−04
11 PLEKHA7 20 9 5.16×10−04
2 SLC19A3 11 4 6.88×10−04
7 ZNF736 8 2 7.94×10−04
15 MGA 49 11 9.08×10−04
8 CA1 7 2 1.18×10−03
Hom 13 DIAPH3 40 7 1.25×10−04
2 SLC8A1 5 3 1.80×10−04
11 PLEKHA7 20 9 2.18×10−04
11 KRTAP5-9 21 3 2.72×10−04
15 POLG 58 8 6.28×10−04
2 SLC19A3 11 4 6.37×10−04
1 ETV3L 31 5 6.63×10−04
7 ZNF736 8 2 7.94×10−04
5 AFAP1L1 20 3 7.95×10−04
3 ANO10 32 3 9.57×10−04

4 DISCUSSION

Extended families have complex correlation structure and association analyses using extended families are very complicated, in particular for dichotomous phenotypes. For instance, the unbalanced nature of family-based samples can lead to inflation or deflation of sandwich estimators for the variance-covariance matrix, and results from generalized estimating equation can be invalid (Wang, Lee et al. 2013). An alternative approach is to use a generalized linear mixed model. However, calculating maximum likelihood estimators requires numerical integration, which is computationally very intensive, and approximations to avoid this can introduce serious bias (Gilmour, Anderson et al. 1985, Schall 1991). Therefore in spite of the efficiency of extended families for rare variant association analysis, few methods have been suggested for family-based association analyses. In this report, we propose a new method of family-based analysis of rare variants associated with dichotomous phenotypes, quantitative phenotypes, or both. The proposed method enables multivariate analyses of extended families to detect rare variants. Extensive simulation studies show that mFARVAT works well for dichotomous and quantitative phenotypes. Our method is computationally efficient and association analyses at the genome-wide scale are computationally feasible for extended families. In our analyses, an Intel (R) Xeon (R) E5-2620 0 CPU at 2.00GHz, with a single node and 80 gigabyte memory, required six minutes to analyze the real data on two phenotypes. mFARVAT is implemented in C++ and freely downloadable from http://healthstat.snu.ac.kr/software/mfarvat.

However, in spite of the analytical flexibility and efficiency of the method, some limitations still remain. First, GRM should ideally be used as the correlation matrix Φ to provide robustness against population substructure; however, proper estimation of GRM requires large-scale common variants. In the absence of such data, the transmission disequilibrium test (Laird, Horvath et al. 2000) is a unique alternative. Second, the proposed statistics are for retrospective designs and power loss is expected if samples are prospectively gathered. It has been shown that appropriate choice of offset minimizes power loss in certain scenarios but further investigation is still necessary. Third, mFARVAT cannot be used directly to analyze X-linked variants. The distribution of X-linked genetic variants in the male is different from that in female, and thus different statistics for males and females are required. This issue will be investigated in future work.

Over the last decade, we have recognized that a substantial amount of unidentified genetic risk exists, and much effort has been expended to investigate this risk. Our methods provide an efficient strategy to analyze rare variant associations in family-based samples, and it may increase understanding of heritable diseases.

Supplementary Material

Supp Info

Table 2. Empirical power estimates when all rare variants are causal and 80% of them are deleterious.

Empirical power of mFARVATSHet,mFARVATBHet,mFARVATOHet,mFARVATBHom,mFARVATSHom and mFARVATOHom was calculated for dichotomous and quantitative multiple phenotypes (Q = 2 and Q = 5) with homogeneous and heterogeneous effects and different correlations (c = 0.5 and c = 0.8) at the 10−4 significant level. Empirical power of FARVAT was calculated by adopting Bonferroni correction to the minimum p-value of univariate association tests on multiple phenotypes.

nphe Type Cor Eff FARVAT mFARVATHet mFARVATHom
SKAT Burden SKAT-O SKAT Burden SKAT-O SKAT Burden SKAT-O
2 D 0.5 Het 0.129 0.148 0.289 0.231 0.327 0.464 0.135 0.302 0.372
Hom 0.150 0.194 0.326 0.252 0.389 0.525 0.342 0.356 0.550
0.8 Het 0.111 0.146 0.270 0.191 0.263 0.422 0.092 0.242 0.316
Hom 0.133 0.164 0.283 0.212 0.313 0.447 0.264 0.279 0.462
Q 0.5 Het 0.355 0.414 0.627 0.523 0.592 0.808 0.301 0.581 0.692
Hom 0.368 0.467 0.670 0.546 0.678 0.854 0.718 0.660 0.892
0.8 Het 0.331 0.376 0.608 0.451 0.491 0.736 0.190 0.492 0.561
Hom 0.319 0.407 0.608 0.461 0.564 0.753 0.577 0.536 0.790
5 D 0.5 Het 0.214 0.270 0.479 0.707 0.763 0.903 0.272 0.745 0.764
Hom 0.228 0.328 0.512 0.750 0.844 0.931 0.887 0.814 0.952
0.8 Het 0.179 0.215 0.386 0.629 0.590 0.819 0.143 0.562 0.586
Hom 0.195 0.290 0.453 0.622 0.705 0.855 0.742 0.672 0.881
Q 0.5 Het 0.577 0.574 0.831 0.962 0.922 0.997 0.459 0.923 0.931
Hom 0.546 0.643 0.839 0.934 0.961 0.997 0.992 0.954 1.000
0.8 Het 0.527 0.490 0.765 0.915 0.802 0.972 0.238 0.791 0.804
Hom 0.477 0.566 0.773 0.865 0.846 0.971 0.953 0.826 0.989

Table 5. Empirical power estimates when half rare variants are causal and 80% of them are deleterious.

Empirical power of mFARVATSHet,mFARVATBHet,mFARVATOHet,mFARVATBHom,mFARVATSHom and mFARVATOHom was calculated for dichotomous and quantitative multiple phenotypes (Q = 2 and Q = 5) with homogeneous and heterogeneous effects and different correlation (c = 0.5 and c = 0.8) at the 10−4 significant level. Empirical power of FARVAT was calculated by adopting Bonferroni correction to the minimum p-value of univariate association tests on multiple phenotypes.

nphe Type Cor Eff FARVAT mFARVATHet mFARVATHom
SKAT Burden SKAT-O SKAT Burden SKAT-O SKAT Burden SKAT-O
2 D 0.5 Het 0.128 0.043 0.174 0.215 0.105 0.306 0.098 0.091 0.182
Hom 0.167 0.055 0.217 0.295 0.165 0.392 0.369 0.132 0.419
0.8 Het 0.392 0.114 0.453 0.565 0.215 0.640 0.219 0.200 0.384
Hom 0.413 0.138 0.490 0.601 0.301 0.705 0.751 0.272 0.793
Q 0.5 Het 0.112 0.045 0.164 0.169 0.079 0.238 0.072 0.062 0.135
Hom 0.159 0.052 0.203 0.240 0.133 0.327 0.301 0.112 0.348
0.8 Het 0.375 0.112 0.410 0.469 0.152 0.526 0.137 0.135 0.267
Hom 0.391 0.145 0.477 0.514 0.245 0.611 0.625 0.223 0.672
5 D 0.5 Het 0.184 0.059 0.245 0.703 0.317 0.769 0.118 0.288 0.363
Hom 0.254 0.108 0.345 0.773 0.518 0.848 0.907 0.458 0.913
0.8 Het 0.581 0.152 0.604 0.968 0.469 0.975 0.209 0.452 0.568
Hom 0.612 0.267 0.696 0.953 0.690 0.977 0.996 0.662 0.997
Q 0.5 Het 0.237 0.049 0.194 0.612 0.211 0.651 0.062 0.187 0.234
Hom 0.237 0.090 0.311 0.669 0.353 0.736 0.779 0.292 0.789
0.8 Het 0.581 0.135 0.568 0.912 0.321 0.927 0.094 0.305 0.352
Hom 0.573 0.197 0.631 0.875 0.512 0.911 0.943 0.459 0.953

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF), as funded by the Korean Government [NRF-2014S1A2A2028559]; the Industrial Core Technology Development Program, as funded by the Ministry of Trade, Industry and Energy, Korea (MOTIE) [10040176]; the Basic Science Research Program through the NRF, as funded by the Ministry of Education [NRF-2013R1A1A2010437]; National Institutes of Health [P01 HL105339, R01 HL075478, R01 HL113264]. Sequencing for the Boston Early-Onset COPD Study was provided by University of Washington Center for Mendelian Genomics, and was funded by the National Human Genome Research Institute and by the National Heart, Lung and Blood Institute [1U54HG006493 to D.N., J.S. and M.B.]

Appendix

Numerical algorithm to calculate pmFARVATOHom

If we let

Z=Ψ1/2W, and Z¯=Z1MQ(1MQt1MQ)1,

the projection matrix onto a space spanned by becomes

Π=Z¯(Z¯tZ¯)Z¯t.

If we let

u=Ψ1/2St1Q11QtTtΦAΦT1Q,u~MVN(0,IMQ),

MSρHom becomes

MSρHom=utΨ12WRWΨ12u=(1ρ)utZZtu+ρM2utZ¯Z¯tu.

As was shown by Lee et al (Lee, Wu et al. 2012), if we let

τ(ρ)=M2ρZ¯tZ¯+(1ρ)Z¯tZ¯Z¯tZZtZ¯,

we have

MSρHom=(1ρ)ut(IMQΠ)ZZt(IMQΠ)u+2(1ρ)ut(IMQΠ)ZZtΠu+τ(ρ)utΠu,

where ut(IMQΠ)ZZt(IMQΠ)u, ut(IMQΠ)ZZtΠu and utΠu are mutually independent. Therefore, if we let Pmin=min{pMS0Hom,pMS0.12Hom,,pMS0.52Hom,pMS1Hom}, we have

P(MSρ0HomQρ0(Pmin),,MSρLHomQρL(Pmin))=E{P(MSρ0HomQρ0(Pmin),,MSρLHomQρL(Pmin)|utΠu=η)}.

Conditional probability can be numerically calculated as was suggested by Lee et al (Lee, Emond et al. 2012, Lee, Wu et al. 2012):

P(MSρ0HomQρ0(Pmin),,MSρLHomQρL(Pmin)|utΠu=η).

Numerical algorithm to calculate pmFARVATOHet

We assume

Z=var(vec(S))1/2(IQW), and Z¯=Z1MQ(1MQt1MQ)1.

Then the projection matrix on a space spanned by is

Π=Z¯(Z¯tZ¯)Z¯t.

If we let

u=var(vec(S))1/2vec(S),u~MVN(0,IMQ),

MSρHet becomes

MSρHet=utvar(vec(S))12(IQW)var(vec(S))12u=(1ρ)utZZtu+ρ(MQ)2utZ¯Z¯tu.

As was suggested by Lee et al (Lee, Wu et al. 2012), if we let

τ(ρ)=(MQ)2ρZ¯tZ¯+(1ρ)Z¯tZ¯Z¯tZZtZ¯,

we have

MSρHet=(1ρ)ut(IMQΠ)ZZt(IMQΠ)u+2(1ρ)ut(IMQΠ)ZZtΠu+τ(ρ)utΠu,

Therefore, if we let Pmin=min{pMS0Het,pMS0.12Het,,pMS0.52Het,pMS1Het}, we have

P(MSρ0HetQρ0(Pmin),,MSρLHetQρL(Pmin))=E{P(MSρ0HetQρ0(Pmin),,MSρLHetQρL(Pmin)|utΠu=η)}.

P(MSρ0HetQρ0(Pmin),,MSρLHetQρL(Pmin)|utΠu=η) can be calculated as in (Lee, Emond et al. 2012, Lee, Wu et al. 2012).

Footnotes

Availability and Implementation: The software is freely available at http://healthstat.snu.ac.kr/software/mfarvat/, implemented in C++ and supported on Linux and MS Windows.

References

  1. Choi S, Lee S, Cichon S, Nothen MM, Lange C, Park T, Won S. FARVAT: a family-based rare variant association test. Bioinformatics. 2014;30(22):3197–3205. doi: 10.1093/bioinformatics/btu496. [DOI] [PubMed] [Google Scholar]
  2. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R G. Genomes Project Analysis. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Gilmour AR, Anderson RD, Rae A. The analysis of binomial data by a generalized linear mixed model. Biometrika. 1985;72:539–599. [Google Scholar]
  4. Gray-McGuire C, Bochud M, Goodloe R, Elston RC. Genetic association tests: a method for the joint analysis of family and case-control data. Hum Genomics. 2009;4(1):2–20. doi: 10.1186/1479-7364-4-1-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Laird NM, Horvath S, Xu X. Implementing a unified approach to family-based tests of association. Genet Epidemiol. 2000;19(Suppl 1):S36–S42. doi: 10.1002/1098-2272(2000)19:1+<::AID-GEPI6>3.0.CO;2-M. [DOI] [PubMed] [Google Scholar]
  6. Lange C, DeMeo DL, Laird NM. Power and design considerations for a general class of family-based association tests: quantitative traits. Am J Hum Genet. 2002;71(6):1330–1341. doi: 10.1086/344696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA, N. G. E. S. P.-E. L. P. Team. Christiani DC, Wurfel MM, Lin X. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet. 2012;91(2):224–237. doi: 10.1016/j.ajhg.2012.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Lee S, Teslovich TM, Boehnke M, Lin X. General framework for meta-analysis of rare variants in sequencing association studies. Am J Hum Genet. 2013;93(1):42–53. doi: 10.1016/j.ajhg.2013.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Lee S, Wu MC, Lin X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics. 2012;13(4):762–775. doi: 10.1093/biostatistics/kxs014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Lee Y, Park S, Moon S, Lee J, Elston RC, Lee W, Won S. On the analysis of a repeated measure design in genome-wide association analysis. Int J Environ Res Public Health. 2014;11(12):12283–12303. doi: 10.3390/ijerph111212283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009;5(2):e1000384. doi: 10.1371/journal.pgen.1000384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. McPeek MS, Wu X, Ober C. Best linear unbiased allele-frequency estimation in complex pedigrees. Biometrics. 2004;60(2):359–367. doi: 10.1111/j.0006-341X.2004.00180.x. [DOI] [PubMed] [Google Scholar]
  14. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 2005;15(11):1576–1583. doi: 10.1101/gr.3709305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Schall R. Estimation in generalized linear models with random effects. Biometrika. 1991;78:719–727. [Google Scholar]
  17. Schifano ED, Li L, Christiani DC, Lin X. Genome-wide association analysis for multiple continuous secondary phenotypes. Am J Hum Genet. 2013;92(5):744–759. doi: 10.1016/j.ajhg.2013.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Shi G, Rao DC. Optimum designs for next-generation sequencing to discover rare variants for common complex disease. Genet Epidemiol. 2011;35(6):572–579. doi: 10.1002/gepi.20597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Silverman EK, Chapman HA, Drazen JM, Weiss ST, Rosner B, Campbell EJ, O'Donnell WJ, Reilly JJ, Ginns L, Mentzer S, Wain J, Speizer FE. Genetic epidemiology of severe, early-onset chronic obstructive pulmonary disease. Risk to relatives for airflow obstruction and chronic bronchitis. Am J Respir Crit Care Med. 1998;157(6 Pt 1):1770–1778. doi: 10.1164/ajrccm.157.6.9706014. [DOI] [PubMed] [Google Scholar]
  20. Thornton T, McPeek MS. Case-control association testing with related individuals: a more powerful quasi-likelihood score test. Am J Hum Genet. 2007;81(2):321–337. doi: 10.1086/519497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Thornton T, Tang H, Hoffmann TJ, Ochs-Balcom HM, Caan BJ, Risch N. Estimating kinship in admixed populations. Am J Hum Genet. 2012;91(1):122–138. doi: 10.1016/j.ajhg.2012.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. van der Sluis S, Posthuma D, Dolan CV. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 2013;9(1):e1003235. doi: 10.1371/journal.pgen.1003235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Wang X, Lee S, Zhu X, Redline S, Lin X. GEE-based SNP set association test for continuous and discrete traits in family-based association studies. Genet Epidemiol. 2013;37(8):778–786. doi: 10.1002/gepi.21763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Won S, Kim W, Lee S, Lee Y, Sung J, Park T. Family-based association analysis: a fast and efficient method of multivariate association analysis with multiple variants. BMC Bioinformatics. 2015;16:46. doi: 10.1186/s12859-015-0484-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Won S, Lange C. A general framework for robust and efficient association analysis in family-based designs: quantitative and dichotomous phenotypes. Stat Med. 2013 doi: 10.1002/sim.5865. [DOI] [PubMed] [Google Scholar]
  26. Won S, Lange C. A general framework for robust and efficient association analysis in family-based designs: quantitative and dichotomous phenotypes. Stat Med. 2013;32(25):4482–4498. doi: 10.1002/sim.5865. [DOI] [PubMed] [Google Scholar]
  27. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89(1):82–93. doi: 10.1016/j.ajhg.2011.05.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Yip W, De G, Raby AB, Laird N. Identifying causal rare variants of disease through family-based analysis of Genetics Analysis Workshop 17 data set. BMC Proceedings. 2011 doi: 10.1186/1753-6561-5-S9-S21. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Info

RESOURCES