Summary
We develop linear mixed models (LMM) and functional linear mixed models (FLMM) for gene-based tests of association between a quantitative trait and genetic variants on pedigrees. The effect of a major gene are modeled as a fixed effect, the contribution of polygenes is modeled as a random effect, and the correlations of pedigree members is modeled via inbreeding/kinship coefficients. F-statistics and chi-squared likelihood ratio test (LRT) statistics based on the LMM and FLMM are constructed to test for association. We show empirically that the F-distributed statistics provide good control of the type I error rate. The F-test statistics of the LMM have similar or higher power than the FLMM, kernel-based famSKAT, and burden test famBT. The F-statistics of the FLMM perform well when analyzing a combination of rare and common variants. For small samples, the LRT statistics of the FLMM control the type I error rate well at the nominal levels α = 0.01 and 0.05. For moderate/large samples, the LRT statistics of the FLMM control the type I error rates well. The LRT statistics of the LMM can lead to inflated type I error rates. The proposed models are useful in whole genome and whole exome association studies of complex traits.
Keywords: rare variants, common variants, complex diseases, functional data analysis, linear mixed models, functional linear mixed models
1. Introduction
Next generation sequencing allows nearly complete evaluation of genetic variation including several million common (e.g., ≥ 1% population frequency) and rare variants (e.g., < 1% population frequency) (Abecasis et al., 2012; Lek et al., 2016; Rusk & Kiermer, 2008; Tennessen et al., 2012). Thus, it is important to properly reduce the high dimensionality of next-generation sequencing data in order to draw useful information. Rare variants have very low frequencies, so the power of single variant-by-variant association analysis of rare variants is limited. It is therefore necessary to group the rare variants to perform gene-based analysis. In recent years, there has been great interest in developing statistical methods to analyze rare variants using grouped region-based tests.
The existing gene-based analysis methods fall into three classes: (1) burden tests (BT), (2) kernel tests, and (3) fixed effect regression models. Burden tests are based on collapsing rare variants to a single variable which is then used to test for an association with the phenotypes (Han & Pan, 2010; Li & Leal, 2008; Morris & Zeggini, 2010; Price et al., 2010). The kernel-based tests, such as sequence kernel association test (SKAT), its optimal unified test (SKAT-O), a combined sum test of rare and common variant effects (SKAT-C), and family-based SKAT (famSKAT), all aggregate the association between variants and phenotypes via a kernel matrix, which measures the similarity between individuals (Chen et al., 2013; Ionita-Laza et al., 2013; Lee et al., 2012; Wu et al., 2011). It was found that SKAT/SKAT-O tests have higher power than burden tests, such as the combined collapsing and multivariate method, nonparametric weighted sum test (Madsen & Browning, 2009), and the cohort allelic sums test (Morgenthaler & Thilly, 2007).
Fixed regression models can be either traditional additive models or functional regression models. By using functional data analysis techniques, a class of fixed effect functional models are developed to test associations between complex traits (i.e., a quantitative or binary or survival trait) and genetic variants for unrelated population samples adjusting for covariates (Fan et al., 2013, 2014, 2016a; Luo et al., 2011, 2012, 2013; Vsevolozhskaya et al., 2014, 2016). The functional models are very flexible and can analyze rare variants or common variants or a combination of the two. The basic idea of functional regression models is to treat multiple genetic variants of an individual in a human population as a realization of an underlying stochastic process (Ross, 1996). The genome of an individual is viewed as a stochastic function which contains both physical position and linkage disequilibrium (LD) information of the genetic markers. In these models, the genome of an individual in a chromosome region is treated as a continuum of sequence data rather than discrete observations.
For unrelated samples, functional regression-based statistics have been built to test for association between phenotypic traits and genetic variants. Extensive simulations and real data analysis demonstrate that the fixed effect functional models perform better in major gene analysis than SKAT/SKAT-O/SKAT-C (Fan et al., 2016b). To date, the functional regression models have only been developed to analyze unrelated population data except for the famFLM method (Svishcheva et al., 2015). Since members of a pedigree are correlated with each other, existing functional regression models can not be directly applied to familial data. There is a need to extend the models to analyze extended pedigrees, properly taking pedigree member relatedness into account (Jiang et al., 2018).
Here, we consider additive linear mixed models (LMM) which are widely used for quantitative trait association studies since they have two remarkable features. First, LMM accurately control the type I error rates and properly correct for confounding arising from population stratification, family structure, and cryptic relatedness. Second, LMM can be applied to samples with arbitrary combinations of related and unrelated individuals. However, LMM so far are mainly designed for testing the association of common variants with quantitative traits (Astle & Balding, 2009; Aulchenko et al., 2007; Kang et al., 2010; Korte et al., 2012; Lippert et al., 2011; Listgarten et al., 2012; Yang et al., 2014; Yu et al., 2006; Zhou & Stephens, 2012). They are typically carried out by testing the association of a single variant, one at a time. There has been very little research on how to utilize LMM to perform gene-based analysis of rare variants or a combination of rare and common variants to analyze extended families.
To take advantage of both LMM and functional data analysis simultaneously, we build functional linear mixed models (FLMM) to connect the phenotypic traits to the genetic variants. Our motivation arises from the superior performance of functional regression models in analyzing unrelated data, and the expectation that this advantage should carry over to the analysis of pedigree data. As in the functional regression models developed previously, the effect of genetic variants is modeled by a genetic effect function. The contribution of polygenes is modeled as a separate random variation, and the correlation of pedigree members is taken care of by kinship coefficients. The LMM and FLMM are an extension of traditional variance component models (Amos, 1994; Lange, 2002). We evaluate performance of the LMM and FLMM via extensive simulations and illustrate their application by analyzing a complex trait, refractive error, with exome chip genotyping of Amish pedigrees (Wojciechowski et al., 2009a, 2009b; Musolf et al., 2017, 2018).
2. Methods
Consider a sample consisting of multiple families. To simplify notation, we consider one pedigree with n individuals labeled i = 1, 2, …,n; each individual i is preceded by all of his/her ancestors. We denote the quantitative traits of the pedigree members by a trait vector Y = (y1,y2, …,yn)′ where ′ denotes the transpose. All individuals in each family are sequenced in a genomic region that has m genetic variants with ordered genetic locations 0 ≤ t1 < ⋯ <tm = T. Here, we assume that the base pair position tℓ of each variant is known. We normalize the region [t1, T] to be [0, 1]. For the ith individual, let Xi = (xi(t1),… ,xi(tm))′ denote the genotypes, coded as the number of minor alleles at each of the m variants, and Zi = (zi1,…, zic)′ denote the covariates.
For the n individuals who are phenotyped and sequenced, let Ω be an n × n matrix containing diagonal elements Ωii = 1 + hi, where hi is the inbreeding coefficient for individual i, and off-diagonal elements Ωij = 2ϕij. The parameter ϕij is the kinship coefficient between individuals i and j, the probability that a randomly chosen allele at a given locus from individual i is identical by descent to a randomly chosen allele from individual j conditional on their ancestral relationship.
2.1. Linear Mixed Effect Models
Linear Mixed Models (LMM). Here we assume that the trait vector Y = (y1,y2,…,yn)′ follows a multivariate normal distribution. By using genotype data directly, we may relate the genetic variants to the trait adjusting for covariates by the following additive LMM
| (1) |
where α0 is an overall mean, α is a c × 1 vector of fixed regression coefficients of the covariates, βℓ is the effect of the genetic variant xi(tℓ), (G1,…, Gn)′ is a normal random vector with mean 0 and covariance matrix and (e1,…, en)′ is a normal vector of error terms with Here is a polygeneic variance component, Gi is an additive polygenic variation, and In×n is an identity matrix. We assume that Gi and ei are independent. Before fitting the LMM (1), QR decomposition can be applied to the genotype data to decompose the genotype matrix into the product of an orthogonal matrix Q and a triangular matrix R via the Gram-Schmidt process.
General FLMM.
We denote the genetic variant function (GVF) of the ith individual by Xi(t),t ∈ [0,1]. By using the genetic information Xi, we may estimate the related GVF Xi(t). To model the relationship between the trait and the GVF Xi(t), consider the following FLMM
| (2) |
where β(t) is the genetic effect of GVF Xi(t) at the location t, and the other terms are the same as additive LMM (1). In the FLMM (2), the GVF Xi(t) and genetic effect function β(t)dt are assumed to be continuous. The continuity of the GVF Xi(t) can be relaxed by considering a model where β(t) is a smooth function; see below.
Beta-smooth only FLMM.
To remove the assumption of the continuity of the GVF Xi(t) in the FLMM (2), a simplified functional linear mixed model is obtained by replacing the integration term in model (2) by the summation term That is, we have
| (3) |
where β(tℓ) is the genetic effect at the location tℓ, and the other terms are similar to those in the additive LMM (1) and the general FLMM (2). In our previous work, we show that a beta-smooth only model performs very similarly to the general functional linear models in applications and simulation studies for population data, in which the Gi term is not included (Fan et al., 2013, 2014, 2016a).
2.2. Revised FLMM
Expansion of the Genetic Effect Function. The genetic effect function β(t) in models (2) and (3) is assumed to be smooth. One may expand it using B-spline or Fourier basis functions. We expand the genetic effect function β(t) using a series of Kβ basis functions as β(t) = ψ(t)′β, where is a vector of coefficients and We consider two types of basis functions: (1) the B-spline basis function where ψ(t) = Bk(t),k = 1, …,Kβ; and (2) the Fourier basis function where ψ1(t) = 1,ψ2r+1(t) = sin(2πrt), and ψ2r (t) = cos(2πrt),r = 1, …, (Kβ − 1)/2. Here for the Fourier basis, Kβ is a positive odd integer (de Boor, 2001; Ferraty & Romain, 2010; Horvath & Kokoszka, 2012; Ramsay et al., 2009; Ramsay & Silverman, 2005).
Estimation of the GVF.
To estimate the genetic variant functions Xi(t) from the genotypes Xi, we use an ordinary linear square smoother. Let ϕk(t),k = 1, …,K, be a series of K basis functions, such as the B-spline basis and Fourier basis functions, and let ϕ(t) = (ϕ1(t), …, ϕK(t))′. Let Φ denote the m by K matrix containing the values ϕk(tℓ). Using the discrete realizations Xi = (xi(t1),…,xi(tm))′, we may estimate the genetic variant function Xi(t) using an ordinary linear square smoother as follows
| (4) |
Revised FLMM.
Here we expand Xi(t) by the ordinary linear square smoother (4). We expand the genetic effect function β(t) as Replacing Xi(t) in the FLMM (2) by in (4) and β(t) by the expansion ψ(t)′β, we have a revised linear mixed model
| (5) |
Where In the above revised model (5), one needs to calculate Φ[Φ′Φ]−1 and in order to get Wi. In the statistical R package fda or Matlab, code is readily available to calculate them (Ramsay et al., 2009).
Revised beta-smooth only FLMM.
In model (3), β(tℓ) is introduced as the genetic effect at the location tℓ. We assume that the genetic effect function β(t) is a function of the genetic location t. Therefore, β(tℓ), ℓ = 1,2, …,m, are the values of function β(t) at the m genetic locations. The genetic effect function β(t) is assumed to be smooth. One may expand it by B-spline or Fourier or linear spline basis functions as above. Replacing β(tℓ) by the expansion, the model (3) can be revised as
| (6) |
Where In model (3) and its revised version (6), we use the raw genotype data Xi = (xi(t1), …, xi(tm))′ directly in the analysis. In addition, we assume that the genetic effect function β(t) is smooth. Hence, we call the models the “beta-smooth only” approach.
2.3. Dealing with Missing Genotype Data
If some genotype data are missing, the FLMM can be modified to analyze the data. For example, assume there is no genotype information at the first variant for the ith individual (i.e., we only have X′ = (?, x′(t2), …, xi(tm))′). Let Φ1 denote the m − 1 by K matrix containing the values ϕk(tj), where j ∈ 2, …,m. Then, we may revise the estimate (4) as
| (7) |
Note that the estimate (7) only depends on the available genotype data (xi(t2), …,xi(tm))′. Hence, each individual′s GVF is estimated by his/her own non-missing data, a practical advantage of the functional data analysis approach. Using (7), one may revise the FLMM (2) to be a form of model (5) accordingly.
If, for example, Xi = (?, xi(t2),…, xi(tm))′ where xi(t1) is missing, we may revise the beta-smooth only FLMM (3) as
| (8) |
The revised FLMM (8) only depends on the available genotype data (xi(t2), …,xi(tm)))′, and it can be revised accordingly to be a form of model (6) as
2.4. Likelihood of LMM and FLMM
The log-likelihood is defined by
In the log-likelihood L, the mean component EY is for the additive LMM (1) and for the FLMM (5) and (6), and ∑ is an n × n is variance-covariance matrix defined as Note that typically the variance-covariance matrix differs from pedigree to pedigree. Under the normality assumption of the LMM, the marginal likelihood has a closed form and maximum likelihood estimation can be performed conveniently for quantitative traits.
2.5. Parameter Estimation and Test Statistics
To test for association between the quantitative trait and the genetic variants, the null hypothesis is H0: β = 0. Under the null the FLMM (5) and (6) simplify to
| (9) |
The null linear mixed model (9) is also a null model of LMM (1). The LMM (1) or FLMM (5) or (6) and the null model (9) are nested. To facilitate parameter estimation, we use Cholesky decomposition of the covariance structure. Briefly, let ∑ = LL′, where L is the Cholesky factor. Let us denote X = L−1Y. Then, we have Var(X) = L−1∑(L′)−1 = In. Therefore, the transformed traits X are standard normal and can be analyzed as independent data. By using the transformed traits X, we may reformulate the null model (9) as
| (10) |
where
and ε is a vector of independent standard normal variables. Similarly, the FLMM (5) or (6) can be re-written as
| (11) |
where W = (W1, …, Wn). The LMM (1) can be re-written using X as a form of model (11). By fitting models (10) and (11), we may test the null H0: β = 0 using an F-distributed or a χ2-distributed likelihood ratio test (LRT) statistic.
2.6. Simulation Studies
To evaluate the performance of the test statistics, we simulated data to estimate empirical type I error rates and power levels. In our simulations, we consider a variant to be rare if its minor allele frequency (MAF) is less than 0.03. Two scenarios were considered: (1) some variants are common and the rest are rare; (2) all variants are rare.
Pedigree Template A of 25 Families.
We first simulated 25 families by randomly choosing progeny sizes from a negative binomial distribution. We assumed that each child within the second generation has a 25% chance of having offspring. The final structure of the pedigrees included 228 individuals (119 males and 109 females; 70 founders and 158 nonfounders). The pedigree size ranged from 4 to 24 with an average value of 9.12.
Pedigree Template B of 50 Families.
By doubling the 25 families, the pedigree structures included 456 individuals (238 males and 218 females; 140 founders and 316 nonfounders) within 50 families.
Pedigree Template C of 75 Families.
By tripling the 25 families, the pedigree structures included 684 individuals (357 males and 327 females; 210 founders and 474 nonfounders) within 75 families.
Genetic Variants.
The sequence data are of European ancestry from 10,000 chromosomes covering a 1 Mb region, simulated by Yun Li at the University of North Carolina, Chapel Hill using the calibrated coalescent model as programmed in COSI (Schaffner et al., 2005). The sequence data were generated using COSI′s calibrated best-fit models, and the generated European haplotypes mimic CEPH Utah individuals with ancestry from northern and western Europe in terms of site frequency spectrum and LD patterns (The International HapMap Consortium, 2007). To evaluate empirical type I error rates and power, we used a gene-dropping simulation approach, first randomly sampling two haplotypes for each founder. Then for each nonfounder in the pedigree, we chose one haplotype at random from each of his or her parents. Genotypes were constructed by summing up two haplotypes for each individual to determine the number of minor alleles at each bp position within the 1 Mb region, assuming no recombination events in this small region during meioses.
Type I Error Simulations.
To evaluate type I error rates of the F-test and LRT statistics, we utilized the three pedigree templates A, B, and C described above. For each pedigree template set, we generated phenotype data sets using the model
| (12) |
where α0 = −4.60, zi1 is a dichotomous covariate taking on values 0 and 1 with a probability of 0.5, zi2 is a continuous covariate from a standard normal distribution N (0,1), σG = 0.2, σe = 0.75, and (G1, …,Gn)′ is generated as a normal vector with mean 0 and a covariance matrix
Genotypes were selected from variants in 3, 6, …, 27, 30 kb subregions randomly selected from the 1 Mb region. Notice that the trait values are not related to the genotypes, and so the null hypothesis holds. For each simulation scenario, 106 phenotype-genotype datasets were generated; for each dataset, we fit the models and calculated the test statistics and related p-values. Then, an empirical type I error rate was calculated as the proportion of 106 p-values which were smaller than a given α level.
Empirical Power Simulations.
To evaluate the power of the F-test and LRT statistics, trait values were generated for each individual based upon the genotypes. To do this, we considered a linear mixed model. We simulated data sets under the alternative hypothesis by randomly selecting subregions to obtain causal variants. First, we generated genotypes of m variants in a selected subregion, similar to the type I error simulations. Then, M of the m variants were randomly selected to be causal, yielding causal genotypes (xi(u1), …, xi(uM)). For each dataset, the causal variants are the same for all the individuals in the dataset, but we allow the causal variants to be different from dataset to dataset. Then, we generated the quantitative traits by
| (13) |
where α0, zi1,zi2, and (G1, …, Gn)′ were the same as in the type I error model (12), (xi(t1), …, xi(tM))′ were genotypes of the ith individual at the causal variants, and the βs are additive effects for the causal variants defined as follows. In the model (13), we used |βj| = c| log10(MAFj)|/2, where c is defined below and MAFj is the MAF of the jth variant. Three different settings were considered: 5%, 10%, and 15% of variants in the subregions are chosen as causal variants. Here for the scenario where some variants are common and the rest are rare, the percentage is over all variants; and for the scenario where all variants are rare, the percentage is over all rare variants. When 5%, 10%, and 15% of the variants were causal, c = log(30)/k, log(20)/k, and log(15)/k, respectively. For the template C of 75 two- or three-generation families, k increases and genetic effect sizes decrease as region sizes increase:
| (14) |
In addition to varying the percentage of causal variants in the subregion, we also varied the direction of effect. We considered situations where (i) all causal variants have positive effects; (ii) 20%/80% causal variants have negative/positive effects; and (iii) 50%/50% causal variants have negative/positive effects. Burden tests are expected to be most powerful when all of the causal variants have effects in the same direction [e.g. under scenario (i)]. For each setting, 1,000 datasets were simulated to calculate the empirical power as the proportion of p-values which are smaller than a given α level.
2.7. Analysis of Refractive Error Data in the Myopia Family Study
To evaluate the performance of the F-test and LRT statistics in a more realistic setting, we exploit exome chip genotypes and a quantitative trait, refractive error measured in Diopters, from Amish families that are part of the Myopia Family Study (Wojciechowski et al., 2009a, b). After sample quality checks using a thorough and rigorous data cleaning pipeline, which included checks for chromosomal aberrations, gender, Hardy-Weinberg equilibrium, relatedness, duplicates, and genotype quality, 300 genotyped as well as phenotyped individuals were available for analysis [see Wojciechowski et al. (2009a) for details of quality control on phenotype data and Musolf et al. (2017, 2018) for details of exome chip genotype data quality control processes]. To completely specify the pedigree structures, we included non-genotyped or non-phenotyped individuals who shared the same family with phenotyped and genotyped family members. The connected pedigrees contained 409 pedigree members who are used to calculate the kinship coefficients. A total number of 52,035 autosomal variants were included in the study within 8,282 genes which contain at least two variants and 1,572 genes which contain at least 6 variants. As refractive error is non-normally distributed in this sample, inverse normal rank transformation was applied prior to association analysis. We adjusted for gender since it is significantly associated with refractive error in the null model (p-value = 0.049).
2.8. Functional Data Analysis Parameters
In the data analysis and simulations described above, we used functions in the R package fda (Ramsay et al., 2009) to create the basis functions. In the simulations presented in the main text and Supplementary Materials I, the order of the B-spline basis was 4, the number of B-spline basis functions was K = Kβ = 20, and the number of Fourier basis functions was K = Kβ = 21. To make sure that the results are valid and stable, we examined a wide range of parameters: 6 ≤ K = Kβ ≤ 27 for the B-spline and Fourier basis functions.
Since most genes contain only a few variants in the Myopia Family Study data, we took a conservative strategy when analyzing the data: the order of the B-spline basis was 4, the number of B-spline basis functions was K = Kβ = 6, and the number of Fourier basis functions was K = Kβ = 7.
3. Results
3.1. Simulation Results
In this section, we present simulation results for the type I error rates and power levels using bar plots, where the statistics evaluated in the figures are identified using the abbreviations defined in Table 1. In the table, five F-distributed statistics, famSKAT, and famBT are presented. The five F-distributed statistics are based on the additive LMM (1) and FLMM (5) and (6). The famSKAT and famBT are from Chen et al. (2013) [see also Oualkacha et al., 2013; Schifano et al., 2012].
Table 1:
Abbreviations used in the main text and figures.
| Notation | Description and Interpretation |
|---|---|
| LMM | linear mixed models |
| F_FLMM_BS | F-test of FLMM (5) with the B-Spline basis vs. null model (9) |
| famSKAT | family-based SKAT |
Simulations Investigating Type I Error Rates.
Extensive simulations were carried out, comparing the type I error rates at five nominal significance levels of the five different F-distributed statistics (listed in Table 1) and five LRT statistics, varying the region size from 3 to 30 kb. The empirical type I error rates are reported at five nominal significance levels α = 0.05, 0.01, 0.001, 0.0001, and 0.00001.
In Tables 2 and 3, the results are based on the template C of 75 families. The empirical type I error rates of the F-test statistics of the LMM (1) and FLMM (5) and (6) are generally lower than the nominal levels. Hence, the F-test statistics are conservative and control the type I error rates correctly, no matter whether the genotype data are smoothed or not or which basis functions are used to smooth the GVF and β(t) or if both rare and common variants are used or only rare variants are used. The empirical type I error rates of the LRT statistics of the FLMM (5) and (6) are around the nominal levels at α = 0.05, 0.01, 0.001, and 0.0001 levels, but can be higher than the α = 0.00001 nominal level. As the region size and number of variants increase, the type I error rates at the nominal level α = 0.00001 of the LRT statistics of the FLMM (5) and (6) are gradually become closer to 0.00001. The empirical type I error rates of the LRT statistics of the LMM (1) are generally higher than the nominal levels.
Table 2:
Empirical type I error rates of the F-distributed statistics and LRT Statistics at nominal levels α = 0.05, 0.01, 0.001, 0.0001, and 0.00001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when region sizes are 3,6, …, 27, 30 kb and some variants are common and the rest are rare. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21.
| Region Size (# Variants) | Nominal Level α | Type I Error Rates of F-distributed Statistics | Type I Error Rates of LRT Statistics | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FLMM (5) | FLMM (6) | LMM (1) | FLMM (5) | FLMM (6) | LMM (1) | |||||||
| B-spline | Fourier | B-spline | Fourier | B-spline | Fourier | B-spline | Fourier | |||||
| 3 kb (59) | 0.05 | 0.021025 | 0.048638 | 0.041797 | 0.047574 | 0.048790 | 0.022653 | 0.051593 | 0.044490 | 0.050484 | 0.051852 | |
| 0.01 | 0.003591 | 0.009408 | 0.007930 | 0.009209 | 0.009476 | 0.004156 | 0.010616 | 0.008997 | 0.010415 | 0.010763 | ||
| 0.001 | 0.000291 | 0.000885 | 0.000735 | 0.000852 | 0.000879 | 0.000382 | 0.001118 | 0.000903 | 0.001076 | 0.001120 | ||
| 0.0001 | 2.50E-05 | 9.00E-05 | 6.50E-05 | 9.10E-05 | 8.40E-05 | 3.60E-05 | 0.000117 | 9.40E-05 | 0.000113 | 0.000114 | ||
| 1.00E-05 | 2.00E-06 | 1.20E-05 | 8.00E-06 | 1.20E-05 | 1.00E-05 | 5.00E-06 | 1.70E-05 | 1.30E-05 | 1.60E-05 | 1.50E-05 | ||
| 6 kb (117) | 0.05 | 0.044983 | 0.047848 | 0.047674 | 0.047708 | 0.047241 | 0.048454 | 0.051794 | 0.051317 | 0.051651 | 0.052409 | |
| 0.01 | 0.008377 | 0.008924 | 0.008965 | 0.008946 | 0.008757 | 0.009705 | 0.010420 | 0.010412 | 0.010467 | 0.010730 | ||
| 0.001 | 0.000747 | 0.000799 | 0.000809 | 0.000794 | 0.000832 | 0.001008 | 0.001095 | 0.001080 | 0.001082 | 0.001182 | ||
| 0.0001 | 7.20E-05 | 7.20E-05 | 7.80E-05 | 7.10E-05 | 7.00E-05 | 0.000109 | 0.000123 | 0.000114 | 0.000123 | 0.000134 | ||
| 1.00E-05 | 7.00E-06 | 5.00E-06 | 7.00E-06 | 5.00E-06 | 1.10E-05 | 1.60E-05 | 1.10E-05 | 1.50E-05 | 1.10E-05 | 1.90E-05 | ||
| 9 kb (176) | 0.05 | 0.047737 | 0.047766 | 0.047946 | 0.047773 | 0.046838 | 0.051529 | 0.051828 | 0.051733 | 0.051831 | 0.053811 | |
| 0.01 | 0.009185 | 0.009205 | 0.009251 | 0.009201 | 0.008677 | 0.010717 | 0.010781 | 0.010778 | 0.010782 | 0.011362 | ||
| 0.001 | 0.000818 | 0.000841 | 0.000827 | 0.000840 | 0.000815 | 0.001089 | 0.001142 | 0.001104 | 0.001142 | 0.001277 | ||
| 0.0001 | 9.70E-05 | 9.20E-05 | 9.60E-05 | 9.20E-05 | 6.90E-05 | 0.000133 | 0.000138 | 0.000132 | 0.000140 | 0.000143 | ||
| 1.00E-05 | 1.00E-05 | 9.00E-06 | 1.00E-05 | 9.00E-06 | 6.00E-06 | 2.00E-05 | 1.50E-05 | 2.00E-05 | 1.50E-05 | 1.30E-05 | ||
| 12 kb (235) | 0.05 | 0.047957 | 0.047567 | 0.047965 | 0.047565 | 0.045992 | 0.051783 | 0.051587 | 0.051810 | 0.051586 | 0.054699 | |
| 0.01 | 0.009034 | 0.009013 | 0.009044 | 0.009013 | 0.008269 | 0.010541 | 0.010506 | 0.010554 | 0.010505 | 0.011560 | ||
| 0.001 | 0.000849 | 0.000798 | 0.000849 | 0.000798 | 0.000700 | 0.001134 | 0.001104 | 0.001138 | 0.001104 | 0.001310 | ||
| 0.0001 | 7.00E-05 | 6.70E-05 | 7.00E-05 | 6.70E-05 | 5.90E-05 | 0.000110 | 0.000108 | 0.000110 | 0.000108 | 0.000144 | ||
| 1.00E-05 | 7.00E-06 | 4.00E-06 | 7.00E-06 | 4.00E-06 | 9.00E-06 | 1.20E-05 | 7.00E-06 | 1.20E-05 | 7.00E-06 | 1.90E-05 | ||
| 15 kb (293) | 0.05 | 0.048177 | 0.048027 | 0.048184 | 0.048027 | 0.045494 | 0.051983 | 0.052043 | 0.051979 | 0.052043 | 0.055872 | |
| 0.01 | 0.008980 | 0.009161 | 0.008981 | 0.009161 | 0.008136 | 0.010507 | 0.010668 | 0.010510 | 0.010668 | 0.012074 | ||
| 0.001 | 0.000789 | 0.000823 | 0.000791 | 0.000823 | 0.000685 | 0.001095 | 0.001105 | 0.001094 | 0.001105 | 0.001393 | ||
| 0.0001 | 7.60E-05 | 8.00E-05 | 7.70E-05 | 8.00E-05 | 6.00E-05 | 0.000117 | 0.000133 | 0.000118 | 0.000133 | 0.000170 | ||
| 1.00E-05 | 8.00E-06 | 5.00E-06 | 8.00E-06 | 5.00E-06 | 7.00E-06 | 1.20E-05 | 1.00E-05 | 1.20E-05 | 1.00E-05 | 2.20E-05 | ||
| 18 kb (352) | 0.05 | 0.048041 | 0.048115 | 0.048042 | 0.048115 | 0.044511 | 0.051962 | 0.052075 | 0.051965 | 0.052075 | 0.056474 | |
| 0.01 | 0.009191 | 0.009151 | 0.009190 | 0.009151 | 0.007883 | 0.010711 | 0.010740 | 0.010708 | 0.010740 | 0.012415 | ||
| 0.001 | 0.000815 | 0.000798 | 0.000816 | 0.000798 | 0.000637 | 0.001094 | 0.001083 | 0.001095 | 0.001083 | 0.001501 | ||
| 0.0001 | 7.00E-05 | 7.50E-05 | 7.00E-05 | 7.50E-05 | 4.70E-05 | 0.000115 | 0.000112 | 0.000115 | 0.000112 | 0.000156 | ||
| 1.00E-05 | 1.00E-05 | 7.00E-06 | 1.00E-05 | 7.00E-06 | 2.00E-06 | 1.10E-05 | 1.30E-05 | 1.10E-05 | 1.30E-05 | 2.20E-05 | ||
| 21 kb (410) | 0.05 | 0.047839 | 0.047807 | 0.047835 | 0.047807 | 0.043991 | 0.051618 | 0.051826 | 0.051613 | 0.051826 | 0.057561 | |
| 0.01 | 0.009033 | 0.008906 | 0.009033 | 0.008906 | 0.007682 | 0.010520 | 0.010481 | 0.010521 | 0.010481 | 0.012735 | ||
| 0.001 | 0.000800 | 0.000831 | 0.000800 | 0.000831 | 0.000639 | 0.001058 | 0.001112 | 0.001058 | 0.001112 | 0.001528 | ||
| 0.0001 | 7.20E-05 | 6.80E-05 | 7.20E-05 | 6.80E-05 | 4.50E-05 | 0.000117 | 0.000122 | 0.000117 | 0.000122 | 0.000205 | ||
| 1.00E-05 | 8.00E-06 | 1.00E-06 | 8.00E-06 | 1.00E-06 | 7.00E-06 | 1.10E-05 | 9.00E-06 | 1.10E-05 | 9.00E-06 | 2.20E-05 | ||
| 24 kb (469) | 0.05 | 0.048516 | 0.047988 | 0.048517 | 0.047988 | 0.043404 | 0.052276 | 0.051995 | 0.052278 | 0.051995 | 0.058415 | |
| 0.01 | 0.009040 | 0.009008 | 0.009040 | 0.009008 | 0.007647 | 0.010527 | 0.010607 | 0.010528 | 0.010607 | 0.013481 | ||
| 0.001 | 0.000817 | 0.000763 | 0.000817 | 0.000763 | 0.000633 | 0.001086 | 0.001061 | 0.001086 | 0.001061 | 0.001752 | ||
| 0.0001 | 7.50E-05 | 6.30E-05 | 7.50E-05 | 6.30E-05 | 5.80E-05 | 0.000120 | 0.000107 | 0.000120 | 0.000107 | 0.000235 | ||
| 1.00E-05 | 5.00E-06 | 3.00E-06 | 5.00E-06 | 3.00E-06 | 5.00E-06 | 1.30E-05 | 9.00E-06 | 1.30E-05 | 9.00E-06 | 3.20E-05 | ||
| 27 kb (527) | 0.05 | 0.047956 | 0.047670 | 0.047954 | 0.047670 | 0.042504 | 0.051746 | 0.051641 | 0.051744 | 0.051641 | 0.059221 | |
| 0.01 | 0.009087 | 0.008973 | 0.009087 | 0.008973 | 0.007321 | 0.010540 | 0.010534 | 0.010540 | 0.010534 | 0.013619 | ||
| 0.001 | 0.000807 | 0.000779 | 0.000807 | 0.000779 | 0.000577 | 0.001071 | 0.001062 | 0.001071 | 0.001062 | 0.001728 | ||
| 0.0001 | 7.50E-05 | 7.00E-05 | 7.50E-05 | 7.00E-05 | 4.90E-05 | 0.000108 | 0.000128 | 0.000108 | 0.000128 | 0.000231 | ||
| 1.00E-05 | 5.00E-06 | 4.00E-06 | 5.00E-06 | 4.00E-06 | 5.00E-06 | 1.00E-05 | 1.40E-05 | 1.00E-05 | 1.40E-05 | 3.30E-05 | ||
| 30 kb (586) | 0.05 | 0.047855 | 0.047912 | 0.047856 | 0.047912 | 0.041902 | 0.051689 | 0.051909 | 0.051690 | 0.051909 | 0.060004 | |
| 0.01 | 0.009045 | 0.008847 | 0.009045 | 0.008847 | 0.007145 | 0.010589 | 0.010473 | 0.010589 | 0.010473 | 0.014220 | ||
| 0.001 | 0.000801 | 0.000775 | 0.000801 | 0.000775 | 0.000565 | 0.001088 | 0.001043 | 0.001088 | 0.001043 | 0.001870 | ||
| 0.0001 | 7.10E-05 | 7.30E-05 | 7.10E-05 | 7.30E-05 | 3.70E-05 | 0.000109 | 9.90E-05 | 0.000109 | 9.90E-05 | 0.000239 | ||
| 1.00E-05 | 3.00E-06 | 5.00E-06 | 3.00E-06 | 5.00E-06 | 4.00E-06 | 1.10E-05 | 9.00E-06 | 1.10E-05 | 9.00E-06 | 2.90E-05 | ||
Table 3:
Empirical type I error rates of the F-distributed statistics and LRT Statistics at nominal levels α = 0.05, 0.01, 0.001, 0.0001, and 0.00001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when region sizes are 3, 6, …, 27, 30 kb and all variants are rare. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21.
| Region Size (# Variants) | Nominal Level α | Type I Error Rates of F-distributed Statistics | Type I Error Rates of LRT Statistics | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FLMM (5) | FLMM (6) | LMM (1) | FLMM (5) | FLMM (6) | LMM (1) | |||||||
| B-spline | Fourier | B-spline | Fourier | B-spline | Fourier | B-spline | Fourier | |||||
| 3 kb (53) | 0.05 | 0.011697 | 0.048875 | 0.040898 | 0.048524 | 0.049074 | 0.012473 | 0.051047 | 0.043001 | 0.050660 | 0.051550 | |
| 0.01 | 0.001889 | 0.009581 | 0.007882 | 0.009496 | 0.009570 | 0.002121 | 0.010464 | 0.008663 | 0.010380 | 0.010542 | ||
| 0.001 | 0.000126 | 0.000924 | 0.000713 | 0.000921 | 0.000921 | 0.000164 | 0.001118 | 0.000868 | 0.001117 | 0.001103 | ||
| 0.0001 | 1.00E-05 | 7.60E-05 | 6.90E-05 | 7.40E-05 | 8.50E-05 | 1.50E-05 | 0.000111 | 8.60E-05 | 0.000106 | 0.000108 | ||
| 1.00E-05 | 1.00E-06 | 1.00E-05 | 8.00E-06 | 1.00E-05 | 1.10E-05 | 2.00E-06 | 1.70E-05 | 1.30E-05 | 1.70E-05 | 1.60E-05 | ||
| 6 kb (106) | 0.05 | 0.034081 | 0.048315 | 0.046020 | 0.047672 | 0.048061 | 0.036685 | 0.051879 | 0.049387 | 0.051260 | 0.052117 | |
| 0.01 | 0.006175 | 0.009086 | 0.008688 | 0.008933 | 0.009108 | 0.007201 | 0.010569 | 0.010014 | 0.010375 | 0.010694 | ||
| 0.001 | 0.000540 | 0.000823 | 0.000768 | 0.000814 | 0.000815 | 0.000731 | 0.001090 | 0.001027 | 0.001072 | 0.001121 | ||
| 0.0001 | 4.20E-05 | 7.60E-05 | 7.00E-05 | 8.70E-05 | 7.40E-05 | 6.60E-05 | 0.000122 | 0.000109 | 0.000128 | 0.000124 | ||
| 1.00E-05 | 4.00E-06 | 8.00E-06 | 6.00E-06 | 8.00E-06 | 6.00E-06 | 8.00E-06 | 1.50E-05 | 9.00E-06 | 1.50E-05 | 1.30E-05 | ||
| 9 kb (159) | 0.05 | 0.046204 | 0.048242 | 0.048321 | 0.048256 | 0.047596 | 0.049718 | 0.052147 | 0.052003 | 0.052192 | 0.053262 | |
| 0.01 | 0.008830 | 0.009118 | 0.009304 | 0.009124 | 0.008925 | 0.010245 | 0.010711 | 0.010811 | 0.010719 | 0.011090 | ||
| 0.001 | 0.000807 | 0.000861 | 0.000844 | 0.000863 | 0.000798 | 0.001112 | 0.001151 | 0.001170 | 0.001150 | 0.001208 | ||
| 0.0001 | 8.50E-05 | 9.00E-05 | 9.50E-05 | 9.30E-05 | 7.50E-05 | 0.000124 | 0.000136 | 0.000135 | 0.000136 | 0.000143 | ||
| 1.00E-05 | 1.10E-05 | 9.00E-06 | 1.00E-05 | 9.00E-06 | 5.00E-06 | 2.00E-05 | 1.40E-05 | 2.30E-05 | 1.40E-05 | 1.20E-05 | ||
| 12 kb (212) | 0.05 | 0.047955 | 0.048358 | 0.048393 | 0.048342 | 0.046926 | 0.051767 | 0.052314 | 0.052231 | 0.052306 | 0.054044 | |
| 0.01 | 0.009163 | 0.009249 | 0.009250 | 0.009250 | 0.008645 | 0.010635 | 0.010815 | 0.010732 | 0.010810 | 0.011389 | ||
| 0.001 | 0.000835 | 0.000837 | 0.000847 | 0.000835 | 0.000741 | 0.001137 | 0.001139 | 0.001153 | 0.001140 | 0.001235 | ||
| 0.0001 | 8.20E-05 | 7.60E-05 | 8.40E-05 | 7.60E-05 | 7.10E-05 | 0.000122 | 0.000123 | 0.000128 | 0.000123 | 0.000146 | ||
| 1.00E-05 | 7.00E-06 | 7.00E-06 | 6.00E-06 | 7.00E-06 | 6.00E-06 | 1.80E-05 | 1.20E-05 | 1.80E-05 | 1.20E-05 | 1.50E-05 | ||
| 15 kb (265) | 0.05 | 0.047998 | 0.047907 | 0.048069 | 0.047897 | 0.046429 | 0.051968 | 0.051793 | 0.052052 | 0.051785 | 0.054614 | |
| 0.01 | 0.009110 | 0.009100 | 0.009130 | 0.009097 | 0.008611 | 0.010577 | 0.010758 | 0.010598 | 0.010758 | 0.011790 | ||
| 0.001 | 0.000844 | 0.000871 | 0.000846 | 0.000872 | 0.000718 | 0.001138 | 0.001153 | 0.001143 | 0.001154 | 0.001350 | ||
| 0.0001 | 8.90E-05 | 9.00E-05 | 8.90E-05 | 9.10E-05 | 4.90E-05 | 0.000139 | 0.000131 | 0.000139 | 0.000132 | 0.000132 | ||
| 1.00E-05 | 8.00E-06 | 6.00E-06 | 8.00E-06 | 6.00E-06 | 8.00E-06 | 1.30E-05 | 1.00E-05 | 1.30E-05 | 1.00E-05 | 2.00E-05 | ||
| 18 kb (318) | 0.05 | 0.048187 | 0.048045 | 0.048201 | 0.048046 | 0.045493 | 0.051922 | 0.052001 | 0.051946 | 0.052002 | 0.055095 | |
| 0.01 | 0.009141 | 0.009099 | 0.009147 | 0.009100 | 0.008331 | 0.010643 | 0.010660 | 0.010650 | 0.010661 | 0.011957 | ||
| 0.001 | 0.000823 | 0.000801 | 0.000825 | 0.000801 | 0.000679 | 0.001109 | 0.001074 | 0.001110 | 0.001074 | 0.001356 | ||
| 0.0001 | 7.20E-05 | 6.20E-05 | 7.30E-05 | 6.20E-05 | 5.60E-05 | 0.000125 | 0.000107 | 0.000127 | 0.000107 | 0.000164 | ||
| 1.00E-05 | 5.00E-06 | 3.00E-06 | 5.00E-06 | 3.00E-06 | 7.00E-06 | 1.50E-05 | 7.00E-06 | 1.50E-05 | 7.00E-06 | 2.10E-05 | ||
| 21 kb (371) | 0.05 | 0.047950 | 0.048063 | 0.047966 | 0.048063 | 0.045322 | 0.051684 | 0.052039 | 0.051692 | 0.052039 | 0.056408 | |
| 0.01 | 0.008960 | 0.009025 | 0.008955 | 0.009025 | 0.008135 | 0.010468 | 0.010670 | 0.010469 | 0.010670 | 0.012498 | ||
| 0.001 | 0.000835 | 0.000772 | 0.000835 | 0.000772 | 0.000670 | 0.001095 | 0.001084 | 0.001095 | 0.001084 | 0.001456 | ||
| 0.0001 | 7.50E-05 | 7.80E-05 | 7.50E-05 | 7.80E-05 | 5.00E-05 | 0.000119 | 0.000119 | 0.000119 | 0.000119 | 0.000168 | ||
| 1.00E-05 | 8.00E-06 | 7.00E-06 | 8.00E-06 | 7.00E-06 | 5.00E-06 | 1.10E-05 | 1.00E-05 | 1.10E-05 | 1.00E-05 | 1.50E-05 | ||
| 24 kb (424) | 0.05 | 0.048216 | 0.048099 | 0.048209 | 0.048099 | 0.045004 | 0.052112 | 0.052018 | 0.052105 | 0.052018 | 0.057104 | |
| 0.01 | 0.008985 | 0.008994 | 0.008984 | 0.008994 | 0.008057 | 0.010551 | 0.010546 | 0.010551 | 0.010546 | 0.012826 | ||
| 0.001 | 0.000820 | 0.000860 | 0.000820 | 0.000860 | 0.000727 | 0.001070 | 0.001174 | 0.001071 | 0.001174 | 0.001591 | ||
| 0.0001 | 7.20E-05 | 7.00E-05 | 7.20E-05 | 7.00E-05 | 5.10E-05 | 0.000106 | 0.000106 | 0.000106 | 0.000106 | 0.000216 | ||
| 1.00E-05 | 5.00E-06 | 1.10E-05 | 5.00E-06 | 1.10E-05 | 7.00E-06 | 1.30E-05 | 1.50E-05 | 1.30E-05 | 1.50E-05 | 2.60E-05 | ||
| 27 kb (477) | 0.05 | 0.047998 | 0.047927 | 0.047995 | 0.047927 | 0.043921 | 0.051744 | 0.051893 | 0.051743 | 0.051893 | 0.057692 | |
| 0.01 | 0.009037 | 0.009023 | 0.009037 | 0.009023 | 0.007595 | 0.010633 | 0.010649 | 0.010633 | 0.010649 | 0.012803 | ||
| 0.001 | 0.000859 | 0.000793 | 0.000859 | 0.000793 | 0.000640 | 0.001129 | 0.001094 | 0.001129 | 0.001094 | 0.001583 | ||
| 0.0001 | 7.40E-05 | 7.40E-05 | 7.40E-05 | 7.40E-05 | 4.80E-05 | 0.000114 | 0.000111 | 0.000114 | 0.000111 | 0.000185 | ||
| 1.00E-05 | 2.00E-06 | 8.00E-06 | 2.00E-06 | 8.00E-06 | 5.00E-06 | 1.00E-05 | 1.30E-05 | 1.00E-05 | 1.30E-05 | 2.40E-05 | ||
| 30 kb (530) | 0.05 | 0.047870 | 0.047953 | 0.047869 | 0.047953 | 0.043784 | 0.051694 | 0.051866 | 0.051694 | 0.051866 | 0.058713 | |
| 0.01 | 0.009105 | 0.009155 | 0.009105 | 0.009155 | 0.007572 | 0.010618 | 0.010696 | 0.010618 | 0.010696 | 0.013239 | ||
| 0.001 | 0.000804 | 0.000793 | 0.000803 | 0.000793 | 0.000642 | 0.001092 | 0.001065 | 0.001093 | 0.001065 | 0.001641 | ||
| 0.0001 | 5.70E-05 | 6.90E-05 | 5.70E-05 | 6.90E-05 | 5.90E-05 | 0.000104 | 0.000116 | 0.000104 | 0.000116 | 0.000227 | ||
| 1.00E-05 | 5.00E-06 | 4.00E-06 | 5.00E-06 | 4.00E-06 | 2.00E-06 | 9.00E-06 | 9.00E-06 | 9.00E-06 | 9.00E-06 | 3.20E-05 | ||
In Tables A.1 and A.2 of Supplementary Materials, we show the type I error rates using the template B of 50 families. The empirical type I error rates of the F-test statistics of the LMM (1) and FLMM (5) and (6) are lower than the nominal levels, and the F-test statistics are conservative. The empirical type I error rates of the LRT statistics of the FLMM (5) and (6) are around the nominal levels at 0.05, 0.01, and 0.001 levels, but can be higher than the nominal levels when α = 0.0001 and 0.00001. The empirical type I error rates of the LRT statistics of the LMM (1) are generally higher than the nominal levels.
In Tables A.3 and A.4 of Supplementary Materials, we show the type I error rates using the template A of 25 families. The empirical type I error rates of the F-test statistics of the LMM (1) and FLMM (5) and (6) are lower than the nominal levels, and the F-test statistics are conservative. The empirical type I error rates of the LRT statistics of the FLMM (5) and (6) are around the nominal levels at 0.05 and 0.01 levels, but can be higher than the nominal levels when α = 0.001, 0.0001, and 0.00001.
Empirical Power Simulations using the template C of 75 families.
Based on the simulated sequence data, the power of the F-test statistics was compared with the power of the famSKAT and famBT statistics. Figures 1 and 2 report the results when 20%/80% causal variants have negative/positive effects and region sizes are 6, 12, and 18 kb, respectively. In Figure 1, some variants are common and the rest are rare, and the variants are all rare in Figure 2. In Supplementary Materials, we report more results in Figures A.1 - A.10 when some variants are common and the rest are rare, and in Figures A.11 - A.20 when the variants are all rare. In plots (a1) - (a3) of Figures A.1 - A.20, all causal variants have positive effects; when 20%/80% causal variants have negative/positive effects, we present the results in plots (b1), (b2), and (b3); when 50%/50% causal variants have negative/positive effects, the results are presented in plots (c1), (c2), and (c3). Therefore, the results of Figure 1 are plots (b1), (b2), and (b3) in the Figures A.2, A.4, and A.6, and the results of Figure 2 are plots (b1), (b2), and (b3) in the Figures A.12, A.14, and A.16, respectively.
Figure 1:
The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare, 20% percent causal variants have negative effects, and the region sizes are 6, 12, and 18 kb, respectively. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviation: CausaLpct means percentage of causal variants.
Figure 2:
The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when all variants are rare, 20% percent causal variants have negative effects, and the region sizes are 6, 12, and 18 kb, respectively. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviation: CausaLpct means percentage of causal variants.
When some variants are common and the rest are rare, the F-test statistics of LMM (1) and FLMM (5) and (6) have higher power than the kernel and burden tests famSKAT and famBT in Figures 1 and A.1 - A.10. The four F-test statistics of FLMM (5) and (6) perform similarly while the F-test statistic of LMM (1) performs the best. If all variants are rare and the region sizes are 3 and 6 kb, the F-test statistics of LMM (1) and FLMM (5) and (6) have similar power as famSKAT in plots (a1), (a2), and (a3) of Figure 2, and Figures A.11 and A.12. If all variants are rare and the region sizes are between 9 and 30 kb, the F-test statistic of LMM (1) has the highest power while famSKAT performs similarly to or better than the F-test statistics of FLMM (5) and (6) in plots (b1), (b2), (b3), (c1), (c2), and (c3) of Figure 2, and Figures A.13 - A.20.
The high power levels of the F-test statistic of LMM (1) in Figures 2 and A.11 - A.20 show that the LMM (1) are useful in analyzing rare variants. When some variants are common and the rest are rare, the LMM (1) are also powerful especially in the presence of a large number of variants. The famSKAT has higher power than the burden test famBT. The four F-test statistics of FLMM (5) and (6) have similar good power levels. The power levels of the F-test statistics of beta-smooth only FLMM (6) are almost identical to those of the F-test statistics of FLMM (5) which smooth both the genetic variant functions Xi(u) and the genetic effect function β(t), regardless of basis choice. Hence, the four F-test statistics of FLMM (5) and (6) are very stable in terms of power performance and they do not strongly depend on whether the genotype data are smoothed or not, nor on which basis function is used.
3.2. Analysis of Refractive Error Data in the Myopia Family Study
We carried out gene-based tests to investigate genes on autosomes that may affect the variation of refractive error using the F-test and LRT statistics, the family kernel-based (famSKAT) and burden test (famBT) (Chen et al., 2013). Quantile-quantile (Q-Q) plots of the gene-based statistics in Figure 3 show that the F-test and LRT statistics, famSKAT, and famBT statistics had similar λGC values. In Table 4, the strongest association was detected between refractive error and NAV2 with p-values of FLMM (5) and (6) tests less than 0.0001, and associations were detected for genes HSPG2;CELA3B, HP1BP3, TEC, SLC9A1;WDTC1, and CADM1;LOC101928985 with p-values of LMM (1) and FLMM (5) and (6) tests are smaller than or around 0.001. Interestingly, famSKAT and famBT show an association signal at the gene HSPG2;CELA3B, but not at the others. We note that none of the genes shows a significant association after a Bonferroni correction 0.05/8282 = 6.04 × 10−6 in this moderately sized sample that includes 36 pedigrees and 300 genotyped/phenotyped individuals.
Figure 3:
Q-Q plots for the F-test and LRT statistics, famSKAT, and famBT for the Myopia Family Study data.
Table 4:
Results of association analysis of refractive error data in the Myopia Family Study. The association results were included if a p-value is smaller or around 10−3. Abbreviation: Chr = chromosome.
| Chr | Gene | Start | End | Number Of Variants | F-distributed Statistics | LRT Statistics | famSKAT | famBT | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FLMM (5) | FLMM (6) | Additive LMM (1) | FLMM (5) | FLMM (6) | Additive LMM (1) | |||||||||||
| B-spline | Fourier | B-spline | Fourier | B-spline | Fourier | B-spline | Fourier | |||||||||
| 11 | NAV2 | 19569563 | 20129311 | 13 | 8.24E-05 | 9.07E-05 | 8.24E-05 | 9.07E-05 | 0.002558 | 4.86E-05 | 5.15E-05 | 4.86E-05 | 5.15E-05 | 0.001704 | 0.317707 | 0.554589 |
Quantile-quantile (Q-Q) plots of the gene-based statistics in Figure 3 show that the F-test and LRT statistics, famSKAT, and famBT statistics had similar λGC values.
4. Discussion
In this paper we develop tests based on LMM and FLMM for gene-based tests of association between a quantitative trait and genetic variants on pedigrees. In the models, the effect of a major gene is modeled as a fixed effect, the contribution of polygenes is modeled as a separate random variation, and the correlation of pedigree members is modeled by inbreeding/kinship coefficients. Cholesky decomposition is utilized to make the traits standard normal. Then, F-distributed statistics and LRT statistics based on the LMM and FLMM are built to test for association between the quantitative trait and the genetic variants. By simulation, we show that the F-distributed statistics are conservative and control type I errors correctly. The proposed models are useful in whole genome and whole exome association studies of complex traits.
The F-test statistics of LMM have similar or higher power than the FLMM, kernel-based famSKAT, and burden test famBT. The FLMM perform well when analyzing a combination of rare and common variants. The kernel-based famSKAT performs better than burden test famBT. In our previous work, we showed that the tests of fixed effect regression models have higher power than SKAT for population data in major gene association studies (Fan et al., 2016b). Therefore, our models provide an alternative competitive method for carrying out gene-based association tests based on next-generation sequencing data.
For small samples of only 25 pedigrees (template A), the LRT statistics of the FLMM (5) and (6) control type I errors correctly at 0.05 and 0.01 levels, but can inflate type I errors when α = 0.001, 0.0001, and 0.00001 when the number of B-spline basis functions was K = Kβ = 20, and the number of Fourier basis functions was K = Kβ = 21. Hence, the LRT statistics of the FLMM (5) and (6) can be used in candidate gene analysis for small samples. When the sample sizes increase, the LRT statistics of the FLMM (5) and (6) control the type I error rates at lower levels α = 0.001,0.0001, and they can be used in genome-wide or exome-wide analysis. The empirical type I error rates of the LRT statistics of the LMM (1) are generally higher than the nominal levels.
In Svishcheva et al. (2015), FLMM were proposed to test association using F-distributed statistics which are essentially our models (2) and (3). However, our LMM (1) was not included in Svishcheva et al. (2015), which actually performs the best among the models we considered. In addition, we examine the performance of the LRT statistics of LMM and FLMM, and show the LRT statistics of FLMM are useful in candidate gene analysis for small samples and are useful in genome-wide or exome-wide analysis if the sample sizes are moderate or large.
Supplementary Material
Table A.1: Empirical type I error rates of the F-distributed statistics and LRT Statistics at nominal levels α = 0.05, 0.01, 0.001, 0.0001, and 0.00001 using the 50 two- or three-generation families with a total of 456 individuals as a template, when region sizes are 3,6, …, 27, 30 kb and some variants are common and the rest are rare. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21.
Table A.2: Empirical type I error rates of the F-distributed statistics and LRT Statistics at nominal levels α = 0.05, 0.01, 0.001, 0.0001, and 0.00001 using the 50 two- or three-generation families with a total of 456 individuals as a template, when region sizes are 3, 6, …, 27, 30 kb and all variants are rare. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21.
Table A.3: Empirical type I error rates of the F-distributed statistics and LRT Statistics at nominal levels α = 0.05, 0.01, 0.001, 0.0001, and 0.00001 using the 25 two- or three-generation families with a total of 228 individuals as a template, when region sizes are 3,6, …, 27, 30 kb and some variants are common and the rest are rare. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21.
Table A.4: Empirical type I error rates of the F-distributed statistics and LRT Statistics at nominal levels α = 0.05, 0.01, 0.001, 0.0001, and 0.00001 using the 25 two- or three-generation families with a total of 228 individuals as a template, when region sizes are 3, 6, …, 27, 30 kb and all variants are rare. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21.
Figure A.1: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 3 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.2: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 6 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; CausaLpct means percentage of causal variants.
Figure A.3: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 9 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.4: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 12 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.5: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 15 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.6: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 18 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.7: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 21 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.8: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 24 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.9: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 27 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.10: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 30 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.11: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 3 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.12: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 6 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.13: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 9 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.14: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 12 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.15: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 15 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.16: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 18 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.17: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 21 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.18: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 24 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.19: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 27 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.20: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 30 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Acknowledgments:
This study was supported by the Intramural Research Program of the National Human Genome Research Institute (Alexander F. Wilson and Joan E. Bailey-Wilson), and by the Intramural Research Program of the National Institute of Mental Health (Francis J. McMahon), National Institutes of Health (NIH), Bethesda, MD. This research was also supported by Wei Chen′s NIH grant R01EY024226 and Yunnan Applied Basic Research Projects, China (No. U0120170557). This study utilized the high-performance computational capabilities of the Biowulf/Linux cluster at the NIH, Bethesda, MD (http://biowulf.nih.gov).
Footnotes
Computer Program. The methods proposed in this paper are implemented using functional data analysis (fda) procedures implemented in the statistical package R. The R codes for data analysis and simulations are available from the web http://www.nichd.nih.gov/about/org/diphr/bbb/software/fan/Pages/default.aspx
Reference
- Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, ⋯ 1000 Genomes Project Consortium (2012). An integrated map of genetic variation from 1,092 human genomes. Nature, 491, 56–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Amos CI (1994). Robust variance-components approach for assessing linkage in pedigrees. American Journal of Human Genetics, 54, 534–543. [PMC free article] [PubMed] [Google Scholar]
- Astle W, & Balding DJ (2009). Population structure and cryptic relatedness in genetic association studies. Statistical Science, 24, 451–471. [Google Scholar]
- Aulchenko YS, de Koning DJ, & Haley C (2007). Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigreebased quantitative trait loci association analysis. Genetics, 177, 577–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen H, Meigs JB, & Dupuis J (2013). Sequence kernel association test for quantitative traits in family samples. Genetic Epidemiology, 37(2), 196–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Boor C (2001). A Practical Guide to Splines. Applied Mathematical Sciences, 27, revised version. New York: Springer. [Google Scholar]
- Fan RZ, Wang YF, Mills JL, Wilson AF, Bailey-Wilson JE, & Xiong MM (2013). Functional linear models for association analysis of quantitative traits. Genetics Epidemiology, 37, 726–742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan RZ, Wang YF, Mills JL, Carter TC, Lobach I, Wilson AF, ⋯ Xiong MM (2014). Generalized functional linear models for case-control association studies. Genetics Epidemiology, 38, 622–637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan RZ, Wang YF, Qi Y, Ding Y, Weeks DE, Lu ZH, ⋯ Chen W (2016a). Gene-based association analysis for censored traits via functional regressions. Genetics Epidemiology, 40, 133–143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan RZ, Chiu CY, Jung JS, Weeks DE, Wilson AF, Bailey-Wilson JE, ⋯ Xiong MM (2016b). A comparison study of fixed and mixed effect models for gene level association studies of complex traits. Genetics Epidemiology, 40, 702–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferraty F, & Romain Y (2010). The Oxford Handbook of Functional Data Analysis. New York: Oxford University Press. [Google Scholar]
- Han F, & Pan W (2010). A data-adaptive sum test for disease association with multiple common or rare variants. Human Heredity, 70, 42–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horváth L, & Kokoszka P (2012). Inference for Functional Data With Applications. New York: Springer. [Google Scholar]
- Ionita-Laza I, Lee S, Makarov V, Buxbaum JD, & Lin X (2013). Sequence kernel association tests for the combined effect of rare and common variants. American Journal of Human Genetics, 92, 841–853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang YD, Chiu CY, Yan Q, Chen W, Gorin MB, Conley YP, ⋯ Fan RZ (2018). Gene-based association testing of dichotomous traits with generalized functional linear mixed models using extended pedigrees. [DOI] [PMC free article] [PubMed]
- Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, & Eskin E (2010). Variance component model to account for sample structure in genome-wide association studies. Nature Genetics, 42, 348–354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, & Nordborg M (2012). A mixed-model approach for genomewide association studies of correlated traits in structured populations. Nature Genetics, 44, 1066–1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lange K (2002). Mathematical and Statistical Methods for Genetic Analysis, 2nd edition. Springer. [Google Scholar]
- Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, & Nickerson DA, ⋯ Lin X (2012). Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. American Journal of Human Genetics, 91, 224–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, ⋯ Exome Aggregation Consortium (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536, 285–291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li B, & Leal SM (2008). Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. American Journal of Human Genetics, 83, 311–321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, & Heckerman D (2011). FaST linear mixed models for genome-wide association studies. Nature Methods, 8, 833–835. [DOI] [PubMed] [Google Scholar]
- Listgarten J, Lippert C, Kadie CM, Davidson RI, Eskin E, & Heckerman D (2012). Improved linear mixed models for genome-wide association studies. Nature Methods, 9, 525–526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luo L, Boerwinkle E, & Xiong MM (2011). Association studies for next-generation sequencing. Genome Research, 21, 1099–1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luo L, Zhu Y, & Xiong MM (2012). Quantitative trait locus analysis for next-generation sequencing with the functional linear models. Journal of Medical Genetics, 49, 513–524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luo L, Zhu Y, & Xiong MM (2013). Smoothed functional principal component analysis for testing association of the entire allelic spectrum of genetic variation. European Journal of Human Genetics, 21, 217–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madsen BE, & Browning SR (2009). A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genetics, 5, e1000384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morgenthaler S, & Thilly WG (2007). A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutation Research, 615, 28–56. [DOI] [PubMed] [Google Scholar]
- Morris AP, & Zeggini E (2010). An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genetic Epidemiology 34, 188–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Musolf AM, Simpson CL, Moiz BA, Long KA, Portas L, Murgia F, ⋯ Bailey-Wilson JE (2017). Caucasian Families Exhibit Significant Linkage of Myopia to Chromosome 11p. Invest Ophthalmol Vis Sci, 58 (9), 3547–3554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Musolf AM, Simpson CL, Long KA, Moiz BA, Lewis DD, Middlebrooks CD, ⋯ Stambolian D (2018). Myopia in Chinese families shows linkage to 10q26.13. Mol Vis, 24, 29–42. [PMC free article] [PubMed] [Google Scholar]
- Oualkacha K, Dastani Z, Li R, Cingolani PE, Spector TD, Hammond CJ, Richards JB, Ciampi A,, & Greenwood CM (2013). Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness. Genetic Epidemiology, 37(4), 366–376. [DOI] [PubMed] [Google Scholar]
- Price AL, Kryukov GV, de Bakker PIW, Purcell SM, Staples J, Wei LJ, & Sunyaev SR (2010). Pooled association tests for rare variants in exon-resequencing studies. American Journal of Human Genetics, 86, 832–838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramsay JO, Hooker G, & Graves S (2009). Functional Data Analysis With R and Matlab. New York: Springer. [Google Scholar]
- Ramsay JO, & Silverman BW (2005). Functional Data Analysis, Second Edition New York: Springer. [Google Scholar]
- Ross SM (1996). Stochastic Processes, Second Edition New York: John Wiley & Sons. [Google Scholar]
- Rusk N, & Kiermer V (2008). Primer: Sequencingthe next generation. Nature Methods, 5, 15. [DOI] [PubMed] [Google Scholar]
- Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, & Altshuler D (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome Research, 15, 1576–1583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schifano ED, Epstein MP, Bielak LF, Jhun MA, Kardia SL, Peyser PA, & Lin X (2012). SNP set association analysis for familial data. Genetic Epidemiology, 36(8), 797–810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Svishcheva GR, Belonogova NM, & Axenovich TI (2015). Region-based association test for familial data under functional linear models. PLoS ONE, 10, e0128999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tennessen JA, Bigham AW, O′Connor TD, Fu W, Kenny EE, Gravel S, ⋯ Broad GO, Seattle GO, NHLBI Exome Sequencing Project. (2012). Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science, 337, 64–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The International HapMap Consortium. (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851–861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vsevolozhskaya OA, Zaykin DV, Greenwood MC, Wei C, & Lu Q (2014). Functional analysis of variance for association studies. PLoS ONE, 9, e105074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vsevolozhskaya OA, Zaykin DV, Barondess DA, Tong X, Jadhav S, & Lu Q (2016). Uncovering local trends in genetic effects of multiple phenotypes via functional linear models. Genetics Epidemiology, 40, 210–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wojciechowski RJ, Bailey-Wilson JE, & Stambolian D (2009a) Fine-mapping of candidate region in Amish and Ashkenazi families confirms linkage of refractive error to a QTL on 1p34-p36. Mol Vis, 15, 1398–406. [PMC free article] [PubMed] [Google Scholar]
- Wojciechowski RJ, Stambolian D, Ciner EB, Ibay G, Holmes TN, & Bailey-Wilson JE (2009b). Genomewide linkage scans for ocular refraction and meta-analysis of four populations in the Myopia Family Study. Invest. Ophthalmol. Vis. Sci, 50 (5), 2024–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu MC, Lee S, Cai T, Li Y, Boehnke M, & Lin X (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. American Journal of Human Genetics, 89, 82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J, Zaitlen NA, Goddard ME, Visscher PM, & Price AL (2014). Advantages and pitfalls in the application of mixed-model association methods. Nature Genetics, 46, 100–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, ⋯ Holland JB (2006). A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics, 38, 203–208. [DOI] [PubMed] [Google Scholar]
- Zhou X, & Stephens M (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genetics, 44, 821–824. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table A.1: Empirical type I error rates of the F-distributed statistics and LRT Statistics at nominal levels α = 0.05, 0.01, 0.001, 0.0001, and 0.00001 using the 50 two- or three-generation families with a total of 456 individuals as a template, when region sizes are 3,6, …, 27, 30 kb and some variants are common and the rest are rare. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21.
Table A.2: Empirical type I error rates of the F-distributed statistics and LRT Statistics at nominal levels α = 0.05, 0.01, 0.001, 0.0001, and 0.00001 using the 50 two- or three-generation families with a total of 456 individuals as a template, when region sizes are 3, 6, …, 27, 30 kb and all variants are rare. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21.
Table A.3: Empirical type I error rates of the F-distributed statistics and LRT Statistics at nominal levels α = 0.05, 0.01, 0.001, 0.0001, and 0.00001 using the 25 two- or three-generation families with a total of 228 individuals as a template, when region sizes are 3,6, …, 27, 30 kb and some variants are common and the rest are rare. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21.
Table A.4: Empirical type I error rates of the F-distributed statistics and LRT Statistics at nominal levels α = 0.05, 0.01, 0.001, 0.0001, and 0.00001 using the 25 two- or three-generation families with a total of 228 individuals as a template, when region sizes are 3, 6, …, 27, 30 kb and all variants are rare. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21.
Figure A.1: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 3 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.2: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 6 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; CausaLpct means percentage of causal variants.
Figure A.3: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 9 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.4: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 12 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.5: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 15 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.6: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 18 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.7: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 21 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.8: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 24 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.9: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 27 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.10: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when some variants are common and the rest are rare and the region size is 30 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.11: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 3 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.12: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 6 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.13: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 9 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.14: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 12 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.15: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 15 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.16: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three-generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 18 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.17: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 21 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.18: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 24 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.19: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 27 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.
Figure A.20: The empirical power of the F-test statistics, famSKAT, and famBT at α = 0.001 using the 75 two- or three- generation families with a total of 684 individuals as a template, when all variants are rare and the region size is 30 kb. The order of B-spline basis was 4, and the number of basis functions of B-spline was K = Kβ = 20; the number of Fourier basis functions was K = Kβ = 21. Abbreviations: Neg_beta_pct means percentage of causal variants which have negative effects; Causal_pct means percentage of causal variants.



